Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relative deviation metrics and the problem of strategy replication Stoyan V. Stoyanov FinAnalytica, Inc., USA e-mail: [email protected] Svetlozar T. Rachev ∗ University of Karlsruhe, Germany and University of California Santa Barbara, USA e-mail: [email protected] Sergio Ortobelli University of Bergamo, Italy e-mail: [email protected] Frank J. Fabozzi Yale University, School of Management e-mail: [email protected] Contact person: Sergio Ortobelli Department MSIA University of Bergamo, Via dei Caniana, 2, 24127, Italy e-mail: [email protected] ∗ Prof Rachev gratefully acknowledges research support by grants from Division of Mathematical, Life and Physical Sciences, College of Letters and Science, University of California, Santa Barbara, the Deutschen Forschungsgemeinschaft and the Deutscher Akademischer Austausch Dienst. 1 Relative deviation metrics and the problem of strategy replication Abstract In the paper, we generalize the classic benchmark tracking problem by introducing the class of relative deviation metrics (r.d. metrics). We consider specific families of r.d. metrics and how they interact with one another. Our approach towards their classification is inspired by the theory of probability metrics — we distinguish between compound, primary and simple r.d. metrics, introduce minimal r.d. metrics, and explore dual forms consistent with a preference order relation defined by a class of utility functions. Finally we focus on r.d. semi-metrics, implied pre-orders and how they can be applied to the problem of beating a benchmark. 2 1 Introduction In the last years there has been a heated debate on the “best” measures to use in risk management and portfolio theory. The fundamental work of Artzner et al. (1998) has been the starting point to define the properties that a measure has to satisfy in order to price coherently the risk exposure of a financial position. Many other works have developed these basic concepts introducing several different families of risk measures (see, among others, Föllmer and Schied (2002), Frittelli and Gianin (2002), Rockafellar et al. (2006), Szegö (2004) and the reference therein). In this paper, we deal with the benchmark tracking-error problem which is a type of an optimal portfolio problem and can also be looked at as an approximation problem. We consider it from a very general viewpoint and replace the tracking-error measure by a general functional satisfying a number of axioms. We call this functional a metric of relative deviation. The axioms are introduced and motivated in Section 2. In Section 3 we establish a relationship between the relative deviation metrics and the deviation measures introduced in Rockafellar et al. (2006). Our approach to the tracking-error problem is based on the universal methods of the theory of probability metrics. In Section 4, we distinguish between compound, simple and primary relative deviation metrics, the properties of which influence the nature of the approximation problem. Next we introduce a minimal functional establishing an important connection between the compound and the simple classes, and, in Section 5, dual forms consistent with a preference order. Finally, in Section 6, we consider examples of 3 dual forms and mention a few cases in which the optimization problem can be linearized. Throughout the paper, we focus on the static tracking-error problem; that is, the metric of relative deviation is defined on the space of real-valued random variables. The only reason is our intention to keep the notation simple and not an inherent limitation of the methodology. Exactly the same approach works in a dynamic setting, if the metrics of relative deviation are defined on the space of random processes. 2 The tracking error problem The the minimal tracking-error problem has the following form min σ(w0 r − rb ) w∈W where W is a set of admissible portfolios, w is a vector of portfolio weights, r is a vector of stocks returns, rb is the return of a benchmark portfolio, and σ(X) stands for the standard deviation of the random variable (r.v.) X. The goal is to find a portfolio which is closest to the benchmark in a certain sense, in this case, determined by the standard deviation. In essence, in the minimal tracking error problem we minimize the uncertainty of the excess returns of the feasible portfolios relative to the benchmark portfolio. A serious disadvantage of the tracking error is that it penalizes in the same way the positive and the negative deviations from the mean excess return while our attitude towards them is asymmetric. We are inclined to pay 4 more attention to the negative ones since they represent relative loss. This argument leads to the conclusion that a more realistic measure of “closeness” should be asymmetric. Our aim is to re-state the minimal tracking-error problem in the more general form min µ(w0 r, rb ) w∈W (1) where µ(X, Y ) is a measure of the deviation of X relative to Y . Due to this interpretation, we regard µ as a functional which metrizes1 relative deviation and we call it a relative deviation metric or simply, r.d. metric. We define the r.d. metric on the linear space of real-valued r.v.s. Not every r.v. belonging to this space can be interpreted as the return of some portfolio because it will not always be possible to find a portfolio the return of which is a given r.v. Nevertheless, those r.v.s for which this can be done, constitute a set in this space and any r.d. metric defined on the entire space can be adopted as an r.d. metric in the corresponding set. What are the properties that µ should satisfy? If the portfolio w0 r is an exact copy of the benchmark, i.e. it contains exactly the same stocks in the same amounts, then the relative deviation of w0 r to rb should be zero. The converse should also hold but, generally, in a somewhat weaker sense. Otherwise problem (1) is not a “tracking” problem. If the deviation of w0 r relative to rb is zero, then the portfolio and the benchmark are indistinguishable but only in the sense implied by µ. They may, or may not, be identical. 1 We use the word metrize in a broad sense and not in the sense of metrizing a given topological structure. 5 Let us assume for now the strongest version of similarity for this situation, that both portfolios are identical. In case the relative deviation of w0 r to rb is non-zero, then µ is positive. These arguments imply the following properties as P1. µ(X, Y ) ≥ 0 and µ(X, Y ) = 0, if X = Y, or as f µ(X, Y ) ≥ 0 and µ(X, Y ) = 0, if and only if X = P1. Y, as where = denotes equality in “almost sure” sense. Whenever the tilde sign is used in this paper, it is implied that the property is a stronger alternative. We already mentioned that in general it is meaningful for µ to be asymmetric but the tracking-error is an example of a symmetric µ. For this reason, at times we may assume symmetry, P2. µ(X, Y ) = µ(Y, X) for all X, Y but in general P2 will not hold. Suppose that we introduce a third portfolio, the returns of which we denote by rc . We would expect that the sum of the relative deviations of w0 r to rc and rc to rb be greater than (or equal to) the deviation of w0 r relative to rb because the return rc introduces additional “noise”; that is, the triangle inequality holds2 , P3. µ(X, Y ) ≤ µ(X, Z) + µ(Z, Y ) for any X, Y, Z 2 Instead of the triangle inequality, one can consider the generalized version µ(X, Y ) ≤ Kµ (µ(X, Z) + µ(Z, Y )) for any X, Y, Z and Kµ ≥ 1 which, obviously, reduces to P3 if Kµ = 1. In line with the terminology in the theory of probability metrics, if we replace P3 by the generalized triangle inequality in Definition 1, we obtain distances instead of metrics. In this way, we can define relative deviation distances and generalize the class of r.d. metrics. Relaxing the triangle inequality, we may obtain functionals having important applications in the field of finance. 6 Now we are in position to define a few basic terms. Let us denote by X the space of r.v.s on a given probability space (Ω, A, P ) taking values in R. By LX2 we denote the space of all joint distributions PrX,Y generated by the pairs X, Y ∈ X. Definition 1. Suppose that a mapping µ(X, Y ) := µ(PrX,Y ) is defined on LX2 taking values in the extended interval [0, ∞]. If it satisfies properties f P2 and P3, then it is called a probability metric on X. a) P1, f and P3, then it is called a probability quasi-metric on X. b) P1 c) P1, P2 and P3, then it is called a probability semi-metric on X. d) P1 and P3, then it is called a probability quasi-semi-metric on X. If we combine the assets in the three portfolios considered above such that we obtain two new portfolios with returns w0 r + rc and rb + rc , then the deviation of w0 r + rc to rb + rc should be less than (or equal to) the deviation of w0 r to rb . This is because we have included some quantities of the same stocks in the initial portfolios with returns w0 r and rb respectively.3 In terms of µ, we obtain P4. (Strong regularity) µ(X + Z, Y + Z) ≤ µ(X, Y ) for all X, Y, Z If P4 is satisfied as equality for any Z, then the functional µ is said to be translation invariant, 3 The two new portfolios can be constructed in the following manner. Assume for simplicity that w0 r and rb are returns of long-only portfolios. Construct a short-only portfolio with return rc . Combine the two long-only portfolios with the short-only, such that we obtain two portfolios with present value equal to zero. The returns of the two zero-value portfolios are w0 r + rc and rb + rc respectively provided that the weights are defined in a suitable way. 7 f µ(X + Z, Y + Z) = µ(X, Y ) for all X, Y, Z P4. In P4, we do not assume any particular relationship between Z and X, Y . A weaker assumption (P4*) will turn out to be more reasonable in some cases. In order to distinguish between them, we call P4 strong regularity property and P4* weak regularity property. P4*. (Weak regularity) µ(X + Z, Y + Z) ≤ µ(X, Y ), for all Z independent of X, Y Here we may ask the following related question. What happens if we add to the two initial portfolios other assets, such that their returns become X + c1 and Y + c2 , where c1 and c2 are arbitrary constants? The relative deviation of X + c1 to Y + c2 should be the same as the one of X to Y , P5. µ(X + c1 , Y + c2 ) = µ(X, Y ) for all X, Y and constants c1 , c2 The above implies that X and X +c are indiscernible for any µ and a constant c. This property allows defining µ on the space of zero-mean r.v.s X0 = {X ∈ X : EX = 0}, (2) where X is the space of all real-valued r.v.s. and E is the mathematical expectation. Thus for arbitrary real-valued r.v.s the relative deviation of X to Y equals µ(X − EX, Y − EX) and P5 is automatically satisfied. Finally, if we assume that multiplying the returns w0 r and rb by a positive number4 changes the relative deviation by the same factor raised to some power s, then µ is said to be positive homogeneous of degree s, 4 This effect appears if we add a cash position to the initial portfolio. 8 The re- P6. µ(aX, aY ) = as µ(X, Y ) for all X, Y, a, s ≥ 0 Now we are ready to define the r.d. metrics. f P5 and Definition 2. Any quasi-metric µ which satisfies P4/P4* (P4), P6 is said to be a (translation invariant) metric of relative deviation. 3 Examples of relative deviation metrics In this section, we deal with translation invariant r.d. metrics and show that there is a one-to-one correspondence with the r.d. metrics generated by the class of deviation measures. The deviation measures are introduced in Rockafellar et al. (2006) and are defined below. Definition 3. A deviation measure is any functional D : X −→ [0, ∞] satisfying (D1) D(X + C) = D(X) for all X and constants C, (D2) D(0) = 0 and D(λX) = λD(X) for all X and all λ > 0, (D3) D(X + Y ) ≤ D(X) + D(Y ) for all X and Y , (D4) D(X) ≥ 0 for all X, with D(X) > 0 for non-constant X. f P3, Proposition 1. The functional µ(X, Y ) = D(X − Y ) satisfies P1, f P5, P6 with s = 1 where D is a deviation measure in the sense of P4, Definition 3. turn becomes (P Vt1 − P Vt0 )/(P Vt0 + c) which equals a(P Vt1 − P Vt0 )/P Vt0 where a = P Vt0 /(P Vt0 + c), P Vt0 is the present value of the portfolio at present time and P Vt1 is the random portfolio value at a future time t1 . Therefore the portfolio return appears scaled by the constant a. 9 f follows from D1 and D4. P4 f is trivial. P5 follows from D1 and P6 Proof. P1 with s = 1 from D2. P3 is easy to show µ(X, Y ) = D(X − Y ) = D(X − Z + (Z − Y )) ≤ D(X − Z) + D(Z − Y ) = µ(X, Z) + µ(Z, Y ) In the reverse direction, we can also show that a translation invariant r.d. metric generates a deviation measure. Proposition 2. The functional D(X) = µ(X, 0) is a deviation measure in the sense of Definition 3 if µ is a translation invariant r.d. metric, positively homegeneous of degree one. Proof. D1 follows from P5. D(0) = µ(0, 0) = 0 and P6 with s = 1 guarantees D2. We can easily show that D3 holds making use of the triangle inequality, D(X + Y ) = µ(X + Y, 0) = µ(X, −Y ) ≤ µ(X, 0) + µ(0, −Y ) = µ(X, 0) + µ(Y, 0) = D(X) + D(Y ) Finally, D4 holds due to P1. Therefore, we can conclude that all deviation measures in the sense of Definition 3 arise from translation invariant r.d. metrics. Another very 10 simple to establish property is that any translation invariant r.d. metric is convex in any of the two arguments. Proposition 3. Any translation invariant r.d. metric µ, for which P6 holds with s = 1, satisfies µ(αX + (1 − α)Y, Z) ≤ αµ(X, Z) + (1 − α)µ(Y, Z) (3) µ(X, αY + (1 − α)Z) ≤ αµ(X, Y ) + (1 − α)µ(X, Z) where α ∈ [0, 1] and X, Y, Z are any real-valued random variables. Proof. We demonstrate how to obtain the first inequality, the second follows by a similar argument. µ(αX + (1 − α)Y, Z) = µ(α(X − Z) + (1 − α)(Y − Z), 0) = µ(α(X − Z), −(1 − α)(Y − Z)) ≤ µ(α(X − Z), 0) + µ(0, −(1 − α)(Y − Z)) = µ(α(X − Z), 0) + µ((1 − α)(Y − Z), 0) = αµ(X − Z, 0) + (1 − α)µ(Y − Z, 0) = αµ(X, Z) + (1 − α)µ(Y, Z) f The inequality follows from P3 The first two equalities hold because of P4. f and P6. and then we make use of P4 The convexity property (3) guarantees that problem (1) is a convex optimization problem if µ satisfies the assumptions in the proposition and X is a convex set in the sense that if X, Y ∈ X , then αX +(1−α)Y ∈ X , α ∈ [0, 1]. 11 Both inequalities in (3) are due to the translation invariance property. Theref to P4 in order fore, we have to impose the convexity properties if we relax P4 for µ to be convex. Definition 4. Any r.d. metric which satisfies any of the inequalities (3) is called a convex r.d. metric. At this point we have shown that the translation invariant r.d. metrics are equivalent to the metrics spawned by the deviation measures and are well-positioned for the benchmark tracking problem (1). Beside this class, there are other examples of r.d. metrics suitable for (1). The next section is devoted to them. The generalized tracking-error problem (1) can also be re-stated as min µ(rb , w0 r). w∈W (4) Whether we minimize µ(w0 r, rb ) or µ(rb , w0 r) greatly depends on the interpretation of µ. This becomes particularly obvious if there is a deviation measure behind µ. In Proposition 2, we considered the functional D(X) = µ(X, 0) which appeared to be a deviation measure. If we switch the arguments of µ, we obtain again a deviation measure as a consequence of the translation invariance property, e D(X) = µ(0, X) = µ(−X, 0) = D(−X). Thus, whether we choose D(w0 r − rb ) or D(rb − w0 r) depends on the particular form of D. For example, suppose that D is the semi-standard 12 deviation σ+ (X) = (E(X − EX)2+ )1/2 where X+ = max(0, X). Thus µ(w0 r, rb ) = (E(w0 r0 − r0b )2+ )1/2 in which w0 r0 and r0b denote centered returns. Minimizing (E(w0 r0 − r0b )2+ )1/2 means that we minimize the positive deviations of w0 r0 relative to r0b which is highly undesirable because this way we minimize our relative profit. The proper way to pose the problem is min (E(r0b − w0 r0 )2+ )1/2 w∈W and, apparently, the objective contains µ(rb , w0 r). The conclusion will be exactly the opposite if D(X) = σ− (X) = (E(X − EX)2− )1/2 where X− = min(0, −X). That is, in this case we should use µ(w0 r, rb ). Apart from the semi-standard deviation, other examples of translation invariant r.d. metrics can be generated from the value-at-risk (VaR) and the conditional value-at-risk (CVaR), Z ∞ CV aRα (X) = tdGX (t) −∞ where 13 GX (t) = 0, t < V aRα (X) (P (−X ≤ t) − (1 − α))/α, t ≥ V aRα (X) in which V aRα (X) is defined as V aRα (X) = − inf{x ∈ R : P (X ≤ x) ≥ α} and α is the tail probability. It can be shown that when considered on X0 , both V aRα (X) and CV aRα (X) are deviation measures, see Rockafellar et al. (2006). Therefore the functionals V aRα (X − Y ) and CV aRα (X − Y ), X, Y ∈ X0 are translation invariant r.d. metrics. 4 Relation to probability metrics — representation theorems In this section we examine the relationship between r.d. metrics and probability metrics (p. metrics). We have already defined the term probability metric in Definition 1. The basic differences are that r.d. metrics are asymmetric (P2 does not hold) and property P5. Therefore if we consider a p. metric defined on X0 , the asymmetry remains the only difference. Any quasi-semimetric generates a p. semi-metric via symmetrization. If µ is a quasi-metric, then µ(X, Y ) = (µ(X, Y )+µ(Y, X))/2 is symmetric and also satisfies P1 and P3. Hence µ(X, Y ) is a p. metric. As a result, any r.d. metric generates a p. metric via symmetrization. Due to the close connection, all classifications of p. metrics remain valid for r.d. metrics. In the theory of p. metrics, we distinguish between primary, simple, and compound metrics (for more information about p. metrics, see Rachev 14 (1991)). We obtain the same classes of r.d. metrics by defining primary, simple, and compound quasi-semi-metrics. In order to introduce primary p. quasi-semi-metrics, we need some additional notation. Let h be a mapping defined on X with values in RJ , that is we associate a vector of numbers with a random variable. The vector of numbers could be interpreted as a set of some characteristics of the random variable. An example of such a mapping is: X → (EX, σ(X)) where the first element is the mathematical expectation and the second is the standard deviation. Furthermore, the mapping h induces a partition of X into classes of equivalence. That is, two random variables X and Y are regarded as equivalent, X ∼ Y , if their corresponding characteristics agree, X∼Y ⇐⇒ h(X) = h(Y ) Since the p. metric is defined on the space of pairs of r.v.s LX2 , we have to translate the equivalence into the case of pairs of r.v.s. Two sets of pairs (X1 , Y1 ) and (X2 , Y2 ) are said to be equivalent if there is equivalence on an element-by-element basis, i.e. h(X1 ) = h(X2 ) and h(Y1 ) = h(Y2 ). Definition 5. Let µ be a probability quasi-semi-metric such that µ is constant on the equivalence classes induced by the mapping h: (X1 , Y1 ) ∼ (X2 , Y2 ) =⇒ µ(X1 , Y1 ) = µ(X2 , Y2 ) Then µ is called primary probability quasi-semi-metric. If the converse implication (⇐=) also holds, then µ is said to be a primary quasi-metric. 15 Here is an illustration. Assume that h maps the r.v.s to their first absolute moments, h(X) = E|X|. Thus (X1 , Y1 ) ∼ (X2 , Y2 ) means that X1 and X2 on one hand, and Y1 and Y2 on the other, have equal first absolute moments, i.e. (E|X1 |, E|Y1 |) = (E|X2 |, E|Y2 |). In this situation, a primary quasi-semimetric would measure the two distances, µ(X1 , Y1 ) and µ(X2 , Y2 ), as equal. The word quasi reminds one that µ(X1 , Y1 ) may not equal µ(Y1 , X1 ). Nevertheless µ(Y1 , X1 ) = µ(Y2 , X2 ). Moreover, if h(X1 ) = h(Y1 ), then necessarily µ(X1 , Y1 ) = 0. The word semi signifies that µ(X1 , Y1 ) = 0 may not induce equality among the corresponding characteristics. These considerations show that a primary quasi-semi-metric in X generates a quasi-semi-metric µ1 (h(X), h(Y )) = µ(X, Y ) in the space of the corresponding characteristics h(X) = {h(X) ∈ RJ , X ∈ X} ⊆ RJ (see Section 6.2 for examples). Definition 6. A probability quasi-semi-metric is said to be simple if for each (X, Y ) ∈ LX2 , P (X ≤ t) = P (Y ≤ t), ∀t ∈ R =⇒ µ(X, Y ) = 0 If the converse implication (⇐=) also holds, then µ is said to be a simple quasi-metric. Definition 7. Any probability quasi-semi-metric in the sense of Definition 1 is compound. as In effect, µ(X, Y ) = 0 implies that (i) X = Y if µ is a compound quasimetric, (ii) FX (t) = FY (t), ∀t ∈ R if µ is a simple quasi-metric, and (iii) h(X) = h(Y ) if µ is a primary quasi-metric. In the discussion of P1, we 16 noted that we would assume the strong property that µ(X, Y ) = 0 implies as X = Y . This may be too demanding. From a practical viewpoint, it may be more appropriate to consider a r.d. metric such that µ(X, Y ) = 0 implies only equality in distribution, or equality of some characteristics of X and Y , without the requirement that X and Y should coincide almost surely. For such cases, simple and primary r.d. metrics are better positioned. Under certain conditions, a p. metric can generate a r.d. metric. Definition 8. A p. metric is called ideal of order p ∈ R if for any r.v.s X, Y, Z ∈ X and any non-zero constant c the next two properties are satisfied (a) Regularity (strong or weak): µ(X + Z, Y + Z) ≤ µ(X, Y ) (b) Homogeneity: µ(cX, cY ) ≤ |c|s µ(X, Y ) If µ is a simple metric, then Z is assumed to be independent of X and Y , that is, weak regularity holds. In this case, µ is said to be weakly-perfect of order s. Due to (a) and (b), any ideal metric defined on X0 turns into a symmetric r.d. metric. We can obtain a quasi-metric from a p. metric by breaking the symmetry axiom while keeping the triangle inequality, thus ensuring that P2 holds in addition to P1 and P3. Let us take as an example the average compound metric L(X, Y ) = (E(d(X, Y ))p )1/p where d(x, y) is a metric in R and p ∈ [1, ∞). If d(x, y) = max(|x|, |y|) is 17 the Minkowski metric, then one way to break the symmetry in this case is to consider L∗p (X, Y ) = (E(max(X − Y, 0))p )1/p (5) A special limit case here is L∗∞ (X, Y ) = inf{ > 0 : P (max(X − Y, 0) > ) = 0} which is the quasi-metric corresponding to the limit case of the Ky-Fan p. metric L∞ (X, Y ) = inf{ > 0 : P |X − Y | > ) = 0}. Still there is a deviation measure behind L∗p (X, Y ) because it is translation invariant. Proposition 4. The functional defined in equation (5) is a compound quasi-semi-metric in X and a compound quasi-metric in X0 . It is translation invariant and homogeneous of degree s = 1. Proof. Let us first consider (5) in X. P1 is trivial. The triangle inequality follows from the sub-additivity of the max function and Minkowski’s inequality. Next we show that (5) is a quasi-metric in X0 . Suppose that L∗p (X, Y ) is defined on X0 and, additionally, that L∗p (X, Y ) = 0. From the definition, we easily observe that the last assumption implies Y ≥ X. Strict inequality is easy to rule out because if Y > X, then EY > EX and this is impossible as by construction since EX = EY = 0. Therefore Y = X and hence L∗p is a f holds. quasi-metric in X0 , i.e. P1 Both translation invariance and the homogeneity degree are obvious. As a second example, let us examine the Birnbaum-Orlicz compound metric 18 Z ∞ 1/p (τ (t; X, Y )) dt p Θp (X, Y ) = −∞ where p ≥ 1, τ (t; X, Y ) = P (X ≤ t < Y ) + P (Y ≤ t < X). In this case, it is easier to obtain a quasi-metric. Consider the functional Θ∗p (X, Y Z ∞ )= 1/p (τ (t; X, Y )) dt ∗ p (6) ∞ where τ ∗ (t; X, Y ) = P (Y ≤ t < X) and p ≥ 1. Proposition 5. The functional defined in equation (6) is a compound quasi-semi-metric in X and a compound quasi-metric in X0 . It satisfies the weak regularity property P4* and is homogeneous of degree s = 1/p. Proof. In order to prove that (6) is a quasi-semi-metric in X, we need to verify P1 and P3. P1. It is trivial since P (X ≤ t < X) = 0 for all t ∈ R. P3. We start by decomposing the function P (Y ≤ t < X), t is fixed. P (Y ≤ t < X) = P ({Y ≤ t} ∩ {X > t}) = P ({Y ≤ t} ∩ {X > t} ∩ {{Z ≤ t} ∪ {Z > t}}) = P ({Y ≤ t} ∩ {X > t} ∩ {Z ≤ t}) + P ({Y ≤ t} ∩ {X > t} ∩ {Z > t}) ≤ P ({X > t} ∩ {Z ≤ t}) + P ({Y ≤ t} ∩ {Z > t}) = P (Z ≤ t < X) + P (Y ≤ t < Z) 19 The third equality holds because the corresponding events have empty intersection. The inequality appears because we ignore events which, generally, have probability less than one. In effect, by Minkowski’s inequality, Θ∗p (X, Y ∞ Z )= ∞ ∞ Z ≤ Z 1/p (P (Y ≤ t < X)) dt p ∞ ∞ = 1/p (P (Z ≤ t < X) + P (Y ≤ t < Z)) dt p 1/p Z (P (Z ≤ t < X)) dt + ∞ p ∞ 1/p (P (Y ≤ t < Z)) dt p ∞ = Θ∗p (X, Z) + Θ∗p (Z, Y ) and we receive the triangle inequality. f holds in X0 . Similarly to Proposition 4, It is simple to verify that P1 suppose that Θ∗p is defined on X0 . If Θ∗p (X, Y ) = 0, then τ ∗ (t; X, Y ) = 0, ∀t ∈ R. Since P (Y ≤ t < X) = 0 for any t implies Y ≥ X, by the same argument, as Proposition 4, we obtain that Θ∗p is a quasi-metric in X0 and a quasi-semi-metric in X. The weak regularity property P4* is checked by applying Young’s convolution inequality. The homogeneity order is verified directly by change of variables. A special limit example of (5) is Θ∗∞ (X, Y ) = supt∈R P (Y ≤ t < X) which is an asymmetric version of the compound metric Θ∞ (X, Y ) = supt∈R τ (t; X, Y ) generating as a minimal metric the celebrated Kolmogorov metric in the space of distribution functions. One might be tempted to surmise that a structured approach towards the 20 generation of classes of r.d. metrics is through asymmetrization of classes of ideal p. metrics, to make them quasi -metrics, and using them on the subspace X0 . The functionals in (5) and (6) support such a generalization. For instance, asymmetrization of (5) with the max function turns the p. metric into a p. quasi-semi-metric on X and considering it on the sub-space X0 turns it into a quasi-metric. In spite of these two examples, in the general case, such an approach would be incorrect because it would sometimes lead to a quasi-semi-metric even on X0 . Let us look at Zolotarev’s ideal metric ζ2 defined as Z ∞ ζ2 (X, Y ) = −∞ Z x Z x FX (t)dt − −∞ −∞ FY (t)dt dx. where FX (t) = P (X ≤ t) is the cumulative distribution function (c.d.f.) of X. An asymmetrization, according to the above scheme, is ζ2∗ (X, Y Z ∞ )= Z x x FY (t)dt − max −∞ Z −∞ FX (t)dt, 0 dx. (7) −∞ d Clearly, if X = Y , then ζ2∗ (X, Y ) = 0. The converse is not true, if ζ2∗ (X, Y ) = 0, then Z x Z x FY (t)dt ≤ −∞ FX (t)dt, ∀x ∈ R, −∞ even assuming X, Y ∈ X0 , which demonstrates that ζ2∗ (X, Y ) is a simple quasi-semi-metric. 21 4.1 Minimal quasi-semi-metrics The minimal quasi-metric is a type of simple quasi-metric obtained from a compound quasi-metric in a special way. Thus certain properties valid for the compound quasi-metric are inherited by the generated minimal quasi-metric. We will assume that the following continuity property holds. CP. Assume that the pairs (Xn , Yn ) ∈ LX2 , n ∈ N converge in distribution to (X, Y ) ∈ LX2 and that µ(Xn , Yn ) → 0 as n → ∞. Then µ(X, Y ) = 0. The property CP is not very restrictive; all examples of µ that we will consider have it. We state it for technical reasons. Lemma 1. Suppose that µ is a compound quasi-semi-metric. Then µ b defined as d d e Ye ) : X e = X, Ye = Y } µ b(X, Y ) = inf{µ(X, (8) d where = means equality in distribution, is a simple quasi-semi-metric. Moreover, if µ is a compound quasi-metric satisfying the CP condition, then µ b is a simple quasi-metric. Finally, if µ satisfies any of the inequalities µ(αX + (1 − α)Y, Z) ≤ αµ(X, Z) + (1 − α)µ(Y, Z) µ(X, αY + (1 − α)Z) ≤ αµ(X, Y ) + (1 − α)µ(X, Z) then so does µ b on condition that the bivariate law in the convex combination, either (X, Y ) or (Y, Z), is known. 22 Proof. We make use of the ideas behind the proof that the minimal functional (8) is a p. semi-metric if µ is a semi-metric, see Rachev (1991). Repeating f P3 hold for µ the arguments there we see that P1, (P1), b. The verification of convexity is very similar to checking P3. We prove that µ b is convex in the first argument; the same reasoning can be used to prove the other inequality. Let X1 , X2 , Z ∈ X0 and α ∈ [0, 1] and assume that we know the bivariate law (X1 , X2 ); this way we know the r.v. Xα := αX1 + (1 − α)X2 , Xα ∈ X0 . We also assume the technical condition that the underlying probability space (Ω, A, P ) is rich enough. Hence, for any > 0, d e1 , Z), e (X e2 , Z) e and (X eα , Z) e such that X e1 = X1 , we can choose bivariate laws (X d d as d e2 = eα = e1 + (1 − α)X e2 = e1 , Z), e X X2 , Ze = Z, X αX Xα , µ b(X1 , Z) + ≥ µ(X e2 , Z). e By assumption µ is convex and therefore and µ b(X2 , Z) + ≥ µ(X eα , Z) e ≤ αµ(X e1 , Z) e + (1 − α)µ(X e2 , Z) e holds. Hence µ(X eα , Z) e ≤ αµ(X e1 , Z) e + (1 − α)µ(X e2 , Z) e µ b(Xα , Z) ≤ µ(X ≤ α(b µ(X1 , Z) + ) + (1 − α)(b µ(X2 , Z) + ) = αb µ(X1 , Z) + (1 − α)b µ(X2 , Z) + Since we minimize over all possible ways to couple Xα and Z, we implicitly assume that the distribution of Xα is known and we hold it fixed in this calculation. The chain of inequalities above is true for any > 0, therefore letting → 0 we prove the convexity of µ b. 23 The additional condition, that we have to know the bivariate law in the convex combination, is not restrictive. If we interpret X1 as the return of a stock and X1 as the return of another stock, then Xα := αX1 +(1−α)X2 , α ∈ [0, 1] is the return of the portfolio composed of the two stocks. If we do not know the bivariate law X1 , X2 , then we cannot compute the distribution of the portfolio return. The result contained in Lemma 1 will be used to derive classes of r.d. metrics consistent with the notion of convergence in distribution. First we have to check whether it is true that if µ is a r.d. metric, then the minimal quasi-metric is also a r.d. metric. Corollary 1. If µ is a compound r.d. metric, then µ b defined in (8) is a simple r.d. metric. Proof. By assumption, µ satisfies P1, P3, P4 or P4*, P5, P6. From Lemma 1, we know that µ b satisfies P1 and P3, provided that the continuity condition holds. It remains to check P4-P6. P4 or P4* follows from the argument which we used to prove convexity, P5 holds trivially since µ (and therefore µ b) is defined in X0 . P6 is easy to check. d d e Ye ) : X e= µ b(aX, aY ) = inf{µ(X, aX, Ye = aY } d d e e = = inf{as µ(X/a, Ye /a) : X/a X, Ye /a = Y } d d e e = = as inf{µ(X/a, Ye /a) : X/a X, Ye /a = Y } = as µ b(X, Y ) 24 We observe the following relationships between the introduced classes of r.d. metrics: a) compound translation invariant r.d. metrics ⊂ compound convex r.d. metrics ⊂ compound r.d. metrics ⊂ compound quasi-metrics ⊂ compound quasi-semi-metrics. b) simple convex r.d. metrics ⊂ compound convex r.d. metrics c) simple translation invariant r.d. metrics 6⊂ compound translation invariant r.d. metrics d) ideal p. metrics ⊂ r.d. metrics e) primary quasi-semi-metrics ⊂ simple quasi-semi-metrics ⊂ compound quasi-semi-metrics f may fail to hold for the minimal metItem c) appears because generally P4 ric. Item e) is due to the general relationship between the corresponding p. metrics. Apart from providing an interesting link between the classes of compound and simple quasi-semi-metrics, the minimal metrics can be used to construct simple quasi-semi-metrics with suitable properties. For example, it is easier to establish the regularity property, or the convexity property, for a compound metric using the method of one probability space. These properties are inherited by the corresponding simple minimal metric. In this fashion, taking advantage of Corollary 1, we can construct simple r.d. metrics. 25 Under some conditions, µ b can be explicitly computed. We can obtain explicit representations through the Cambanis-Simons-Stout theorem, see Cambanis et al. (1976). The basic results are contained in the next theorem. Theorem 1. Given X, Y ∈ X with finite moment R R φ(x, a)dF (x), a ∈ R where φ(x, y) is a quasi-antitone function, i.e. φ(x, y) + φ(x0 , y 0 ) ≤ φ(x0 , y) + φ(x, y 0 ) for any x0 > x and y 0 > y, then Z µ bφ (X, Y ) = 1 φ(FX−1 (t), FY−1 (t))dt 0 where FX−1 (t) = inf{x : FX (x) ≥ t} is the generalized inverse of the c.d.f. FX (x) and also µ bφ (X, Y ) = µφ (FX−1 (U ), FY−1 (U )) where U is a uniformly distributed r.v. on (0, 1). By virtue of Corollary 1, µ bφ is a r.d. metric if µφ is a r.d. metric and clearly it depends only on the distribution functions of X and Y . The function φ should be such that φ(x, x) = 0 but generally it may not be symmetric, φ(x, y) 6= φ(y, x). Examples of φ include f (x − y) where f is a non-negative convex function in R. In particular, one might choose φ(x, y) = H1 (max(x − y, 0)) + H2 (max(y − x, 0)) where H1 , H2 : [0, ∞) → [0, ∞) are convex, non-decreasing functions. If H1 (t) = t and H2 (t) = 0, φ∗ (x, y) = max(x − y, 0), then 26 Z 1 max(FX−1 (t) − FY−1 (t), 0)dt µφ∗ (X, Y ) = (9) 0 We can see that (9) is the minimal quasi-semi-metric of (5). If (9) is defined on the space of zero-mean random variables, X, Y ∈ X0 , then µφ∗ (X, Y ) is a quasi-metric and, therefore, according to Corollary 1, (9) is a r.d. metric, see Proposition 4. Without this restriction, if X, Y ∈ X, then µφ∗ (X, Y ) is a quasi-semi-metric. Similarly, we obtain the minimal metrics of L∗p defined in (5), lp∗ (X, Y ) = Lb∗p (X, Y ) = 1/p 1 Z (max(FX−1 (t) − FY−1 (t), 0))p dt . 0 The equation in (9) is just l1∗ (X, Y ). Furthermore, Proposition 4 shows that L∗p is a translation invariant r.d. metric and therefore it is convex. As a result, according to Lemma 1, lp∗ is a simple convex r.d. metric. Other explicit forms can be computed also for the family Θ∗p defined in (6) through the Frechet-Hoeffding inequality. Lemma 2. Suppose that X, Y ∈ X and H : [0, ∞] → [0, ∞] is nondecreasing. If Θ∗H (X, Y Z ∞ H(P (Y ≤ t < X))dt )= −∞ then b ∗ (X, Y ) = Θ H Z ∞ H(max(FY (t) − FX (t), 0))dt −∞ 27 Proof. The demonstration is straightforward. Z ∞ H(P (Y ≤ t < X))dt µ(X, Y ) = Z−∞ ∞ H(P (Y ≤ t) − P (Y ≤ t, X ≤ t))dt = Z−∞ ∞ ≥ H(FY (t) − min(FX (t), FY (t)))dt Z−∞ ∞ H(max(FY (t) − FX (t), 0))dt = −∞ The inequality is application of the celebrated Frechet-Hoeffding upper bound min(FX (x), FY (y)) ≥ P (X ≤ x, Y ≤ y). See Rachev (1991) p. 153, 154 for more general results. Corollary 2. Choosing H = tp , p ≥ 1, we receive θp∗ (X, Y )= b ∗p (X, Y Θ Z ∞ 1/p (max(FY (t) − FX (t), 0)) dt p )= −∞ ∗ b ∗∞ (X, Y ) = supt∈R [max(FY (t) − FX (t), 0)]. and θ∞ (X, Y ) = Θ With respect to the homogeneity property, the examples (5), (6) and their minimal counterparts show a wide range of degrees. The family L∗p and the minimal metrics lp∗ it generates are homogeneous of degree one. In contrast, the families Θ∗p and θp∗ are homogeneous of degree s = 1/p. At the limit, Θ∗∞ ∗ ∗ and θ∞ are both homogeneous of degree zero. The functional θ∞ turns into the Kolmogorov p. metric ρ(X, Y ) = sup |FX (t) − FY (t)|. t∈R 28 having replaced the max function with absolute value. The representatives with s = 1 are Θ∗1 and θ1∗ . The functional θ1∗ turns into the Kantorovich p. metric Z ∞ |FX (t) − FY (t)|dt. κ(X, Y ) = −∞ in the same fashion. 5 Dual representations At the end of Section 4, we gave an example illustrating that sometimes via asymmetrization one may obtain a quasi-semi-metric. A natural question arises. Is it wrong to consider problem (1) with a quasi-semi-metric instead of a quasi-metric in the objective? A quasi-metric guarantees that if there is a feasible portfolio w such that µ(w0 r, rb ) = 0 then it is a solution to the as d problem and either w0 r = rb or w0 r = rb or h(w0 r) = h(rb ) depending on the type of the quasi-metric, compound, simple, or primary (see Section 4 for more details) where h(w0 r) denotes a countable set of portfolio-return characteristics. This property was stated as a basic rationale behind the suggestion to use a quasi-metric. After all, if the feasible set contains portfolios as d with positive expected excess return, then any of w0 r = rb or w0 r = rb means that we have beaten the the benchmark. What changes in this picture if we replace the r.d. metric by a r.d. semimetric? Understandably, we cannot rely any longer on any type of equality between w0 r and r in contrast to the case of a r.d. metric. Nevertheless, 29 we obtain a notion of dominance if µ(w0 r, rb ) = 0 which we will call the implied pre-order of µ and this is sufficient to justify the application of a r.d. semi-metric. For instance, at the end of Section 4 we considered ζ2∗ which, apparently, implies second-order stochastic dominance (SSD) regardless of whether we consider X or X0 . Other examples of implied pre-orders can be constructed by looking at lp∗ and θp∗ when defined on X — they all imply first-order stochastic dominance (FSD). The corresponding compound quasimetrics L∗p and Θ∗p imply the same pre-order because L∗p (X, Y ) = 0 and Θ∗p (X, Y ) = 0 imply X ≤ Y almost surely, which is equivalent to FX (t) ≤ FY (t), ∀t and, therefore, FSD. Naturally, there are many quasi-semi-metrics implying one and the same pre-order. Definition 9. We call a binary relation µ implied by the quasi-semimetric µ if X µ Y is equivalent to µ(X, Y ) = 0, that is X µ Y if and only if µ(X, Y ) = 0 It is easy to notice that the binary relation µ as defined is indeed a pre-order, i.e. it is reflexive (X µ X) and transitive (X µ Y and Y µ Z =⇒ X µ Z). Reflexivity follows from P1, µ(X, X) = 0. Transitivity is a consequence of the triangle inequality, 0 ≤ µ(X, Z) ≤ µ(X, Y ) + µ(Y, Z) = 0. We can associate a binary relation of equivalence ∼µ to a given pre-order, X ∼µ Y ⇐⇒ X µ Y and Y µ X. For example, if µ is symmetric, then we have an equivalence relation. This is because firstly µ(X, Y ) = 0 =⇒ X µ Y from the definition above and secondly µ(X, Y ) = µ(Y, X) =⇒ Y µ X from the symmetry and the definition above. Combining both we get X ∼µ Y if µ(X, Y ) = 0. Beside semi-metrics, equivalence relation is implied by quasi- 30 metrics. Actually, the necessary and sufficient condition for the equivalence relation to hold is µ(X, Y ) = µ(Y, X) = 0. For a simple quasi-metric, ∼µ is actually equality in distribution, and for a compound quasi-metric — equality in almost sure sense. Suppose that we would like to solve the benchmark-tracking problem (1) with a given quasi-semi-metric µ. A feasible portfolio such that µ(w0 r, rb ) = 0 is truly a solution and, as we have mentioned, the portfolio w dominates the benchmark rb in the sense suggested by µ. Intuitively, the functional µ metrizes the distance to the desirable set of all portfolios, which are no worse than the benchmark in which worse is in a µ-sense. A logical next step in this analysis is to answer the question if it is possible to characterize the set of these portfolios in terms of a global preference of a class of investors; that is, if µ(w0 r, rb ) = 0, then all investors in a given class prefer w to the benchmark. If a representation of this sort exists, we call it a dual representation or form of µ. (Ortobelli et al. (2006) develop a detailed theory about consistencies between probability functionals and preference orders.) To begin with, suppose that there is a set of investors identified by their utility function u ∈ U who choose between two portfolios with return X and Y respectively. We assume the utility functions are defined on the return rather than the wealth. A given investor prefers Y to X, X u Y , if Eu(X) ≤ Eu(Y ), u ∈ U. Define the set of all pairs (X, Y ) such that Y is preferred to X for a fixed investor, gr(u) := {(X, Y ) : Eu(X) ≤ Eu(Y )} 31 The portfolio with return Y is preferred by all investors if the pair (X, Y ) belongs to gr(U) = \ gr(u) = {(X, Y ) : Eu(X) ≤ Eu(Y ), for all u ∈ U} u∈U All investors are indifferent to X or Y if (X, Y ) ∈ gr(U) and (Y, X) ∈ gr(U) and the two portfolios are indistinguishable in this sense. Therefore the set gr(U) is non-empty because the pair (X, X) satisfies all conditions. We define the following functional 0, ν(X, Y ) = sup (X, Y ) ∈ gr(U) u∈U b [Eu(X) − Eu(Y )], (X, Y ) 6∈ gr(U) (10) = sup [max(Eu(X) − Eu(Y ), 0)] u∈U b where U b ⊆ U and is such that if ν(X, Y ) = 0, then (X, Y ) ∈ gr(U). Therefore X and Y are indistinguishable if ν(X, Y ) = 0 and ν(Y, X) = 0, i.e. this the implied ν-equivalence. If there is at least one investor who does not prefer Y to X, then ν(X, Y ) > 0. The reason for introducing the subclass U b is to avoid the explosion of ν(X, Y ). For example, if U is a closed, convex cone or a closed, convex set, then U b might be the smallest set of functions generating U via limits of convex combinations and in this case U b might be regarded as some set of ”corner” elements of U. Also, U b should not contain ”equivalent” functions as far as the preference relation (with X and Y fixed) 32 is concerned, i.e. u and v are considered equivalent if X u Y if and only if X v Y . A simple example is {u(x) : u(x) = av(x), a > 0} which is obviously equivalent to v(x). Thus there is some minimal set for which the quasi-semi-metric ν(X, Y ) is consistent with the set gr(U) — if ν(Y, X) = 0, fb ⊇ U b , the corresponding metthen (X, Y ) ∈ gr(U). For any other set U ric νUfb (X, Y ) ≥ νU b (X, Y ), where the index denotes the set with respect to which the supremum in (10) is calculated. Proposition 6. The functional ν(X, Y ) defined in (10) is a simple quasisemi-metric in X. d Proof. If X = Y , then ν(X, Y ) = 0. ν(X, Y ) ≥ 0 is trivial. The triangle inequality is simple to show using the representation with the max function. The natural symmetrization of ν(X, Y ) is defined as ν(X, Y ) = sup |Eu(X) − Eu(Y )| (11) u∈U b The functional (11) is apparently symmetric by definition but this is still not sufficient for ν(X, Y ) to be a metric in the space of distribution functions. There should be an additional condition on the class U b — it should be ”rich enough” in order for ν(Y, X) to be a simple metric. We will compute ν(X, Y ) for some classes U b in Section 6.2. The above proposition states that ν(X, Y ) is a simple quasi-semi-metric. Properties P4 and P6 are not possible to verify unless we assume a particular class U b . In addition, for some U b the functional ν(X, Y ) may turn into a 33 semi-metric as it will satisfy the symmetry property. In the dual form that we conjecture here, the random variables might belong to X or X0 . For instance in the Rothschild-Stiglitz preference order, the relation in the space of distribution functions is EX = EY and Rx Rx F (t)dt ≤ F (t)dt, ∀x ∈ R, if X is preferred to Y . Therefore we X −∞ −∞ Y can consider the centered variables X and Y in X0 and leave the integral inequality. 6 Applications In this section, we give a number of applications of the developed theory. 6.1 Compound vs simple r.d. metrics Let us return to the minimal tracking error problem in order to illustrate the difference between compound and simple r.d. metrics. The optimization problem can be re-stated in terms of the Lp metric Lp (X, Y ) = (E|X − Y |p )1/p , p ≥ 1 since, by definition, σ(X − Y ) = L2 (X, Y ) where L2 is defined in X0 . Therefore we can consider the equivalent optimization problem min L2 (X, Y ) X∈X0 (12) where X0 contains the centered r.v.s from X. Let us impose the additional assumption that X0 is the space of all random variables in X0 independent 34 of Y ; that is, in (12) we fix the joint dependence of (X, Y ) and vary only the distribution of X. We know that (L2 (X, Y ))2 = E(X − Y )2 = EX 2 − 2E(XY ) + EY 2 = EX 2 + EY 2 , X, Y ∈ X0 because E(XY ) = EXEY = 0 due to the independence assumption. As a result, the problem min EX 2 X∈X0 is equivalent to (12) because EY 2 does not depend on X. The obvious a.s. solution is X ∗ = 0, which is the only constant in X0 . If the elements of X0 are interpreted as the centered returns of some feasible portfolios, then X ∗ means that we have to invest everything in the risk-free asset. Apparently this is not in line with the intuition that the solution should be “similar” to the benchmark with centered return Y . Now let us replace L2 by the minimal metric l2 (X, Y ) = Lb2 (X, Y ). The explicit representation, due to Theorem 1, is Z l2 (X, Y ) = 1/2 1 (FX−1 (t) 0 and the problem becomes 35 − FY−1 (t))2 dt min l2 (X, Y ) X∈X0 (13) If X ∗ is an independent copy of Y , then l2 (X, Y ) = 0 and it follows that X ∗ is a solution to (13). This is in agreement with what we expected. However simplistic this example may be, it is a warning that improper dependence assumptions may lead to degenerate results. 6.2 Simple examples of dual representations In this sub-section we give examples of some quasi-semi-metrics computed through the suggested dual form in (10). 6.2.1 Quadratic utility functions, X, Y ∈ X0 Let us consider the class of quadratic utility functions Ua+ = {u(x) : u(x) = ax2 + bx, a, b ≥ 0}. The plus sign in the index emphasizes the assumption that a ≥ 0. We will compute ν(X, Y ) for X, Y ∈ X0 . If we set U b = Ua+ , we obtain νa+ (X, Y ) := sup (max(a(EX 2 − EY 2 , 0))) u∈U + a 0, EX 2 ≤ EY 2 = ∞, EX 2 > EY 2 It can easily be seen that if EY 2 > EX 2 , then Eu(Y ) > Eu(X) for all u ∈ Ua+ because EX = EY = 0 due to the assumption Y, X ∈ X0 . Therefore 36 we can choose only one representative for the set U b , for example u(x) = x2 and we obtain νa+ (Y, X) = max(EX 2 − EY 2 , 0) Indeed, in this special case νa+ (Y, X) is not a quasi-metric in the space of distribution functions and this is because the class Ua+ is not rich enough. It is a primary quasi-semi-metric and implies the following pre-order: Y is preferred to X by all investors, X νa+ Y , if and only if νa+ (Y, X) = 0 which is equivalent to EX 2 ≤ EY 2 . All investors prefer the r.v. with larger second moment because all utility functions are convex due to the imposed condition a ≥ 0. Now consider Ua− = {u(x) : u(x) = ax2 + bx, b ≥ 0, a < 0}. Then νa− (X, Y ) = supu∈Ua− (max(|a|(EY 2 − EX 2 ), 0)) and the same reasoning as above leads to the conclusion that νa− (X, Y ) = max(EY 2 − EX 2 , 0). Therefore in this case Y is preferred to X by all investors, X νa− Y , if and only if EY 2 ≤ EX 2 , that is all investors prefer the random variable with smaller second moment which is understandable as their utility functions are concave and investors are risk-averse. These very simple results are in line with the intuition and classical theory. The construction of the quasi-semimetric is more or less trivial. We have noted that the set gr(U) may be more or less “poor” and this depends on the class U, which governs the properties of ν(X, Y ). For instance, 37 if U = Ua+ S Ua− , then we get ν(X, Y ) = max(νa+ (X, Y ), νa− (X, Y )) = max(EX 2 − EY 2 , EY 2 − EX 2 ) = |EX 2 − EY 2 | and the set gr(U) contains only those random variables the second moment of which matches EX 2 , i.e. ν(X, Y ) is a primary p. metric. In this case the symmetry of ν(X, Y ) appears due to symmetrization of the class U b . It is easy to check that all examples in this sub-section are weakly regular. Moreover, P6 holds with s = 2 for all of them and therefore νa+ , νa− and ν are r.d. metrics. 6.2.2 Quadratic utility functions, X, Y ∈ X All results in the previous section can be received by a more formal treatment. It is easy to notice that the set Ua+ is generated by the functions u1 (x) = x2 and u2 (x) = x having removed the equivalent functions (equivalent up to a positive multiplier). The function u2 can be disregarded because all considerations were in X0 . Extending the reasoning to X, we have to bear in mind both u1 and u2 . Computing the supremum over only the generating set of two functions, we get νa+ (X, Y ) = max(EX 2 − EY 2 , EX − EY, 0) The same reasoning, when applied to Ua− , leads to 38 νa− (X, Y ) = max(EY 2 − EX 2 , EX − EY, 0) since in this case U b = {−x2 , x}. Both νa+ (X, Y ) and νa− (X, Y ) are primary quasi-semi-metrics. Combining them, that is computing the supremum over {x2 , −x2 , x}, we obtain ν(X, Y ) = max(|EX 2 − EY 2 |, EX − EY ) This is not a metric yet; rather, it is a quasi-metric because we have required b > 0. Extending the calculation over all quadratic functions, we obtain ν(X, Y ) = max(|EX 2 − EY 2 |, |EX − EY |) which is the Minkowski metric in R2 where the axes are the first and the second moment, respectively. It is in this case that the functional turns into a primary p. metric. Note that all examples in these sub-sections are well defined if the considered random variables have a finite second moment. 6.2.3 The class of non-satiable investors The utility functions of the class of non-satiable investors are non-decreasing. The minimal set of functions U b generating the class of non-decreasing functions is U b = {u(x) : u(x) = I{x≥a} , a ∈ R} where I{x≥a} is the indicator function of the set {x ≥ a} with a being fixed. For more details, see Brumelle and Vickson (1975). An application of the definition in (10) yields 39 ν(X, Y ) := sup (max(Eu(X) − Eu(Y ), 0)) u∈U b Z Z = sup max I{x≥a} dFX (x) − I{x≥a} dFY (x), 0 a∈R R R (14) = sup max(P (X ≥ a) − P (Y ≥ a)) a∈R = sup max(P (Y < a) − P (X < a), 0) a∈R This functional is a quasi-semi-metric and we easily notice that ν(X, Y ) = ∗ ∗ θ∞ (X, Y ) when X, Y have continuous c.d.f.s. The functional θ∞ (X, Y ) is discussed in Corollary 2. The same reasoning applied to the natural symmetrization (11) leads to the celebrated Kolmogorov metric ν(X, Y ) = ρ(X, Y ) = sup |FX (a) − FY (a)| a ∗ Therefore, as we have already noted in Section 4.1, ν(X, Y ) = θ∞ (X, Y ) can be regarded as an asymmetric version of the Kolmogorov metric. Eventually, ∗ we have obtained two representations of θ∞ — one as a minimal metric of Θ∗∞ and another through the dual form in (14). Repeating the arguments in Proposition 5 and Lemma 2 for the functional Θ?p (X, Y Z ∞ )= 1/p (P (Y < t ≤ X)) dt p −∞ we can see that Θ?p satisfies the same properties as Θ∗p and they both coincide ? b ?∞ (X, Y ) = when X and Y have continuous c.d.f.s. Moreover θ∞ (X, Y ) := Θ ν(X, Y ) and thus ν(X, Y ) is weakly regular and positive homogeneous of 40 degree zero. Therefore ν(X, Y ) as given in (14) is a r.d. metric. 6.2.4 The class of non-satiable, risk-averse investors The utility functions of the class of all non-satiable, risk-averse investors are increasing and concave. In this case, the minimal set of functions U b generating them is U b = {u(x) : u(x) = min(x, a), a ∈ R}, for more details, see Brumelle and Vickson (1975). Applying the definition, we get ν(X, Y ) := sup (max(Eu(X) − Eu(Y ), 0)) u∈U b Z Z = sup max min(x, a)dFX (x) − min(x, a)dFY (x), 0 a∈R R R Z a Z a = sup max FY (x)dx − FX (x)dx, 0 a∈R −∞ (15) −∞ The last equality holds because Z a xdFX (x) + a(1 − FX (a)) Z a =a− FX (x)dx E min(X, a) = −∞ −∞ after integration by parts and assuming that X has a finite first absolute moment. This technical condition is very natural from a practical viewpoint: if the random variables have E|X| = ∞, then the fundamental notion of expected return breaks down. The functional in (15) is a quasi-semi-metric in X0 and is closely related to Zolotarev’s p. metric. 41 6.3 Solving the optimization problems in practice Suppose that we have scenarios rk ∈ Rn , k = 1, N for the stock returns in the portfolio and also scenarios rkb , k = 1, N for the benchmark returns. Also suppose that rk and rkb are modeled jointly, that is they are in one and the same state of the world. They could also be historical observations; we observe the benchmark and the stock returns in a given period and collect the observations. Problem (1) has the following form when µ(X, Y ) = L∗1 (X, Y ) = E max(X− Y, 0), min E max(rb − w0 r, 0) w∈W This is a convex optimization problem if W is convex because L∗1 is translation invariant. The objective function can be approximated with the available scenarios and the new problem is N 1 X min max(rkb − w0 rk , 0) w∈W N k=1 (16) and this problem can be replaced by an equivalent linear problem by the standard approach, 42 min w∈W N 1 X dk N k=1 s.t. rkb − w0 rk ≤ dk dk ≥ 0 Now let us state problem (1) when the underlying r.d. metric is the corresponding minimal metric l1∗ (X, Y ) given in equation (9), Z min w∈W 0 1 −1 max(Fr−1 b (t) − Fw 0 r (t), 0)dt This problem is convex if W is convex because of the properties of l1∗ (X, Y ), see Lemma 1, and can also be approximated with the available scenarios by considering the empirical versions of the generalized inverse of the c.d.f.s. Then the integral can be easily calculated because the integrands are step functions and it turns into a sum. The simplified problem is min w∈W N 1 X b max(r(k) − (w0 r)(k) , 0) N k=1 (17) b b where (w0 r)(1) ≤ (w0 r)(2) ≤ . . . ≤ (w0 r)(N ) and similarly r(1) ≤ r(2) ≤ ... ≤ b r(N ) are the ordered observations of the portfolio returns and the benchmark returns. The difference between problems (16) and (17) is that in the second problem the observations are sorted. This is not surprising because the minimal r.d. metric takes into account only the distance between the distribution 43 functions. The dependence between w0 r and rb is destroyed by the sorting. The max function in the objective can be linearized but due to the sorting, problem (17) is more difficult than (16). For specific choices of r.d. metrics, particular linearized problems can be constructed, as in the case of the CVaR (see Rockafellar and Uryasev (2002)). For other examples of linearized problems, see Ortobelli et al. (2006). 7 Conclusion This paper discusses the connections between probability metrics theory and benchmark tracking-error problems. We define a new class of functionals, metrizing the relative deviation of a portfolio to a benchmark. Firstly, we observe that the class of deviation measures introduced by Rockafellar et al. (2006) is generated by the class of translation invariant r.d. metrics. Secondly, we divide all r.d. metrics into three categories — primary, simple and compound functionals and introduce minimal r.d. metrics. The three classes of r.d. metrics assign different properties to the optimal portfolio problem and the minimal functional can be used in a constructive way to obtain simple r.d. metrics. The classification and the methods are inspired by the theory of probability metrics. Further on, we show that under certain conditions, the minimal quasi-semi-metric admits an integral representation. Finally, we analyze the implied pre-orders of quasi-semi-metrics, introduce a dual form consistent with a global preference relation and describe possible applications to portfolio selection problems. Even though in the paper we consider a static problem, the generality of 44 the suggested approach allows for extensions in a dynamic setting by studying quasi-semi-metrics not in the space of random variables but in the space of random processes. References Artzner, P., F. Delbaen, J.-M. Eber and D. Heath (1998), ‘Coherent measures of risk’, Math. Fin. 6, 203–228. Brumelle, D. L. and R. G. Vickson (1975), ‘A unified approach to stochastic dominance’, in Stochastic Optimization Models in Finance, Ziemba and Vickson (eds) pp. 101–113. Cambanis, S., G. Simons and W. Stout (1976), ‘Inequalities for ek(x,y) when the marginals are fixed’, Z. Wahrsch. Verw. Geb. 36, 285–294. Föllmer, H. and A. Schied (2002), ‘Convex measures of risk and trading constraints’, Finance and Stochastics pp. 429–447. Frittelli, M. and E. Rosazza Gianin (2002), ‘Putting order in risk measures’, Journal of Banking and Finance 26, 1473–1486. Ortobelli, L. S., S. Rachev, H. Shalit and F. Fabozzi (2006), ‘Risk probability functionals and probability metrics applied to portfolio theory’, Working paper, Department of Probability and Applied Statistics, University of California, Santa Barbara, USA . Rachev, S. T. (1991), Probability Metrics and the Stability of Stochastic Models, Wiley, Chichester, U.K. 45 Rockafellar, R. T. and S. Uryasev (2002), ‘Conditional value-at-risk for general loss distributions’, Journal of Banking and Finance 26, (7), 1443– 1471. Rockafellar, R. T., S. Uryasev and M. Zabarankin (2006), ‘Generalized deviations in risk analysis’, Finance and Stochastics 10, 51–74. Szegö, G. (2004), Risk measures for the 21st century, Wiley & Son Chichester. 46