Download Relative deviation metrics and the problem of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Relative deviation metrics and the problem of
strategy replication
Stoyan V. Stoyanov
FinAnalytica, Inc., USA
e-mail: [email protected]
Svetlozar T. Rachev
∗
University of Karlsruhe, Germany and
University of California Santa Barbara, USA
e-mail: [email protected]
Sergio Ortobelli
University of Bergamo, Italy
e-mail: [email protected]
Frank J. Fabozzi
Yale University, School of Management
e-mail: [email protected]
Contact person:
Sergio Ortobelli
Department MSIA
University of Bergamo,
Via dei Caniana, 2, 24127, Italy
e-mail: [email protected]
∗
Prof Rachev gratefully acknowledges research support by grants from Division of
Mathematical, Life and Physical Sciences, College of Letters and Science, University
of California, Santa Barbara, the Deutschen Forschungsgemeinschaft and the Deutscher
Akademischer Austausch Dienst.
1
Relative deviation metrics and the problem of strategy
replication
Abstract
In the paper, we generalize the classic benchmark tracking problem
by introducing the class of relative deviation metrics (r.d. metrics).
We consider specific families of r.d. metrics and how they interact with
one another. Our approach towards their classification is inspired by
the theory of probability metrics — we distinguish between compound,
primary and simple r.d. metrics, introduce minimal r.d. metrics, and
explore dual forms consistent with a preference order relation defined
by a class of utility functions. Finally we focus on r.d. semi-metrics,
implied pre-orders and how they can be applied to the problem of
beating a benchmark.
2
1
Introduction
In the last years there has been a heated debate on the “best” measures
to use in risk management and portfolio theory. The fundamental work of
Artzner et al. (1998) has been the starting point to define the properties that
a measure has to satisfy in order to price coherently the risk exposure of a
financial position. Many other works have developed these basic concepts
introducing several different families of risk measures (see, among others,
Föllmer and Schied (2002), Frittelli and Gianin (2002), Rockafellar et al.
(2006), Szegö (2004) and the reference therein).
In this paper, we deal with the benchmark tracking-error problem which
is a type of an optimal portfolio problem and can also be looked at as an
approximation problem. We consider it from a very general viewpoint and
replace the tracking-error measure by a general functional satisfying a number of axioms. We call this functional a metric of relative deviation. The
axioms are introduced and motivated in Section 2. In Section 3 we establish a relationship between the relative deviation metrics and the deviation
measures introduced in Rockafellar et al. (2006).
Our approach to the tracking-error problem is based on the universal
methods of the theory of probability metrics. In Section 4, we distinguish
between compound, simple and primary relative deviation metrics, the properties of which influence the nature of the approximation problem. Next we
introduce a minimal functional establishing an important connection between
the compound and the simple classes, and, in Section 5, dual forms consistent with a preference order. Finally, in Section 6, we consider examples of
3
dual forms and mention a few cases in which the optimization problem can
be linearized.
Throughout the paper, we focus on the static tracking-error problem;
that is, the metric of relative deviation is defined on the space of real-valued
random variables. The only reason is our intention to keep the notation
simple and not an inherent limitation of the methodology. Exactly the same
approach works in a dynamic setting, if the metrics of relative deviation are
defined on the space of random processes.
2
The tracking error problem
The the minimal tracking-error problem has the following form
min σ(w0 r − rb )
w∈W
where W is a set of admissible portfolios, w is a vector of portfolio weights,
r is a vector of stocks returns, rb is the return of a benchmark portfolio, and
σ(X) stands for the standard deviation of the random variable (r.v.) X. The
goal is to find a portfolio which is closest to the benchmark in a certain sense,
in this case, determined by the standard deviation.
In essence, in the minimal tracking error problem we minimize the uncertainty of the excess returns of the feasible portfolios relative to the benchmark
portfolio. A serious disadvantage of the tracking error is that it penalizes in
the same way the positive and the negative deviations from the mean excess
return while our attitude towards them is asymmetric. We are inclined to pay
4
more attention to the negative ones since they represent relative loss. This
argument leads to the conclusion that a more realistic measure of “closeness”
should be asymmetric.
Our aim is to re-state the minimal tracking-error problem in the more
general form
min µ(w0 r, rb )
w∈W
(1)
where µ(X, Y ) is a measure of the deviation of X relative to Y . Due to this
interpretation, we regard µ as a functional which metrizes1 relative deviation
and we call it a relative deviation metric or simply, r.d. metric.
We define the r.d. metric on the linear space of real-valued r.v.s. Not
every r.v. belonging to this space can be interpreted as the return of some
portfolio because it will not always be possible to find a portfolio the return
of which is a given r.v. Nevertheless, those r.v.s for which this can be done,
constitute a set in this space and any r.d. metric defined on the entire space
can be adopted as an r.d. metric in the corresponding set.
What are the properties that µ should satisfy? If the portfolio w0 r is an
exact copy of the benchmark, i.e. it contains exactly the same stocks in the
same amounts, then the relative deviation of w0 r to rb should be zero. The
converse should also hold but, generally, in a somewhat weaker sense. Otherwise problem (1) is not a “tracking” problem. If the deviation of w0 r relative
to rb is zero, then the portfolio and the benchmark are indistinguishable but
only in the sense implied by µ. They may, or may not, be identical.
1
We use the word metrize in a broad sense and not in the sense of metrizing a given
topological structure.
5
Let us assume for now the strongest version of similarity for this situation,
that both portfolios are identical. In case the relative deviation of w0 r to rb is
non-zero, then µ is positive. These arguments imply the following properties
as
P1. µ(X, Y ) ≥ 0 and µ(X, Y ) = 0, if X = Y,
or
as
f µ(X, Y ) ≥ 0 and µ(X, Y ) = 0, if and only if X =
P1.
Y,
as
where = denotes equality in “almost sure” sense. Whenever the tilde sign is
used in this paper, it is implied that the property is a stronger alternative.
We already mentioned that in general it is meaningful for µ to be asymmetric but the tracking-error is an example of a symmetric µ. For this reason,
at times we may assume symmetry,
P2. µ(X, Y ) = µ(Y, X) for all X, Y
but in general P2 will not hold.
Suppose that we introduce a third portfolio, the returns of which we
denote by rc . We would expect that the sum of the relative deviations of w0 r
to rc and rc to rb be greater than (or equal to) the deviation of w0 r relative
to rb because the return rc introduces additional “noise”; that is, the triangle
inequality holds2 ,
P3. µ(X, Y ) ≤ µ(X, Z) + µ(Z, Y ) for any X, Y, Z
2
Instead of the triangle inequality, one can consider the generalized version µ(X, Y ) ≤
Kµ (µ(X, Z) + µ(Z, Y )) for any X, Y, Z and Kµ ≥ 1 which, obviously, reduces to P3 if
Kµ = 1. In line with the terminology in the theory of probability metrics, if we replace
P3 by the generalized triangle inequality in Definition 1, we obtain distances instead of
metrics. In this way, we can define relative deviation distances and generalize the class of
r.d. metrics. Relaxing the triangle inequality, we may obtain functionals having important
applications in the field of finance.
6
Now we are in position to define a few basic terms. Let us denote by X
the space of r.v.s on a given probability space (Ω, A, P ) taking values in R.
By LX2 we denote the space of all joint distributions PrX,Y generated by the
pairs X, Y ∈ X.
Definition 1. Suppose that a mapping µ(X, Y ) := µ(PrX,Y ) is defined
on LX2 taking values in the extended interval [0, ∞]. If it satisfies properties
f P2 and P3, then it is called a probability metric on X.
a) P1,
f and P3, then it is called a probability quasi-metric on X.
b) P1
c) P1, P2 and P3, then it is called a probability semi-metric on X.
d) P1 and P3, then it is called a probability quasi-semi-metric on X.
If we combine the assets in the three portfolios considered above such
that we obtain two new portfolios with returns w0 r + rc and rb + rc , then the
deviation of w0 r + rc to rb + rc should be less than (or equal to) the deviation
of w0 r to rb . This is because we have included some quantities of the same
stocks in the initial portfolios with returns w0 r and rb respectively.3 In terms
of µ, we obtain
P4. (Strong regularity) µ(X + Z, Y + Z) ≤ µ(X, Y ) for all X, Y, Z
If P4 is satisfied as equality for any Z, then the functional µ is said to be
translation invariant,
3
The two new portfolios can be constructed in the following manner. Assume for
simplicity that w0 r and rb are returns of long-only portfolios. Construct a short-only
portfolio with return rc . Combine the two long-only portfolios with the short-only, such
that we obtain two portfolios with present value equal to zero. The returns of the two
zero-value portfolios are w0 r + rc and rb + rc respectively provided that the weights are
defined in a suitable way.
7
f µ(X + Z, Y + Z) = µ(X, Y ) for all X, Y, Z
P4.
In P4, we do not assume any particular relationship between Z and X, Y . A
weaker assumption (P4*) will turn out to be more reasonable in some cases.
In order to distinguish between them, we call P4 strong regularity property
and P4* weak regularity property.
P4*. (Weak regularity) µ(X + Z, Y + Z) ≤ µ(X, Y ), for all Z independent
of X, Y
Here we may ask the following related question. What happens if we
add to the two initial portfolios other assets, such that their returns become
X + c1 and Y + c2 , where c1 and c2 are arbitrary constants? The relative
deviation of X + c1 to Y + c2 should be the same as the one of X to Y ,
P5. µ(X + c1 , Y + c2 ) = µ(X, Y ) for all X, Y and constants c1 , c2
The above implies that X and X +c are indiscernible for any µ and a constant
c. This property allows defining µ on the space of zero-mean r.v.s
X0 = {X ∈ X : EX = 0},
(2)
where X is the space of all real-valued r.v.s. and E is the mathematical
expectation. Thus for arbitrary real-valued r.v.s the relative deviation of X
to Y equals µ(X − EX, Y − EX) and P5 is automatically satisfied.
Finally, if we assume that multiplying the returns w0 r and rb by a positive
number4 changes the relative deviation by the same factor raised to some
power s, then µ is said to be positive homogeneous of degree s,
4
This effect appears if we add a cash position to the initial portfolio.
8
The re-
P6. µ(aX, aY ) = as µ(X, Y ) for all X, Y, a, s ≥ 0
Now we are ready to define the r.d. metrics.
f P5 and
Definition 2. Any quasi-metric µ which satisfies P4/P4* (P4),
P6 is said to be a (translation invariant) metric of relative deviation.
3
Examples of relative deviation metrics
In this section, we deal with translation invariant r.d. metrics and show
that there is a one-to-one correspondence with the r.d. metrics generated by
the class of deviation measures. The deviation measures are introduced in
Rockafellar et al. (2006) and are defined below.
Definition 3. A deviation measure is any functional D : X −→ [0, ∞]
satisfying
(D1) D(X + C) = D(X) for all X and constants C,
(D2) D(0) = 0 and D(λX) = λD(X) for all X and all λ > 0,
(D3) D(X + Y ) ≤ D(X) + D(Y ) for all X and Y ,
(D4) D(X) ≥ 0 for all X, with D(X) > 0 for non-constant X.
f P3,
Proposition 1. The functional µ(X, Y ) = D(X − Y ) satisfies P1,
f P5, P6 with s = 1 where D is a deviation measure in the sense of
P4,
Definition 3.
turn becomes (P Vt1 − P Vt0 )/(P Vt0 + c) which equals a(P Vt1 − P Vt0 )/P Vt0 where a =
P Vt0 /(P Vt0 + c), P Vt0 is the present value of the portfolio at present time and P Vt1 is the
random portfolio value at a future time t1 . Therefore the portfolio return appears scaled
by the constant a.
9
f follows from D1 and D4. P4
f is trivial. P5 follows from D1 and P6
Proof. P1
with s = 1 from D2. P3 is easy to show
µ(X, Y ) = D(X − Y ) = D(X − Z + (Z − Y ))
≤ D(X − Z) + D(Z − Y ) = µ(X, Z) + µ(Z, Y )
In the reverse direction, we can also show that a translation invariant r.d.
metric generates a deviation measure.
Proposition 2. The functional D(X) = µ(X, 0) is a deviation measure
in the sense of Definition 3 if µ is a translation invariant r.d. metric, positively homegeneous of degree one.
Proof. D1 follows from P5. D(0) = µ(0, 0) = 0 and P6 with s = 1 guarantees
D2. We can easily show that D3 holds making use of the triangle inequality,
D(X + Y ) = µ(X + Y, 0) = µ(X, −Y )
≤ µ(X, 0) + µ(0, −Y ) = µ(X, 0) + µ(Y, 0)
= D(X) + D(Y )
Finally, D4 holds due to P1.
Therefore, we can conclude that all deviation measures in the sense of
Definition 3 arise from translation invariant r.d. metrics. Another very
10
simple to establish property is that any translation invariant r.d. metric is
convex in any of the two arguments.
Proposition 3. Any translation invariant r.d. metric µ, for which P6
holds with s = 1, satisfies
µ(αX + (1 − α)Y, Z) ≤ αµ(X, Z) + (1 − α)µ(Y, Z)
(3)
µ(X, αY + (1 − α)Z) ≤ αµ(X, Y ) + (1 − α)µ(X, Z)
where α ∈ [0, 1] and X, Y, Z are any real-valued random variables.
Proof. We demonstrate how to obtain the first inequality, the second follows
by a similar argument.
µ(αX + (1 − α)Y, Z) = µ(α(X − Z) + (1 − α)(Y − Z), 0)
= µ(α(X − Z), −(1 − α)(Y − Z))
≤ µ(α(X − Z), 0) + µ(0, −(1 − α)(Y − Z))
= µ(α(X − Z), 0) + µ((1 − α)(Y − Z), 0)
= αµ(X − Z, 0) + (1 − α)µ(Y − Z, 0)
= αµ(X, Z) + (1 − α)µ(Y, Z)
f The inequality follows from P3
The first two equalities hold because of P4.
f and P6.
and then we make use of P4
The convexity property (3) guarantees that problem (1) is a convex optimization problem if µ satisfies the assumptions in the proposition and X is a
convex set in the sense that if X, Y ∈ X , then αX +(1−α)Y ∈ X , α ∈ [0, 1].
11
Both inequalities in (3) are due to the translation invariance property. Theref to P4 in order
fore, we have to impose the convexity properties if we relax P4
for µ to be convex.
Definition 4. Any r.d. metric which satisfies any of the inequalities (3)
is called a convex r.d. metric.
At this point we have shown that the translation invariant r.d. metrics
are equivalent to the metrics spawned by the deviation measures and are
well-positioned for the benchmark tracking problem (1). Beside this class,
there are other examples of r.d. metrics suitable for (1). The next section is
devoted to them.
The generalized tracking-error problem (1) can also be re-stated as
min µ(rb , w0 r).
w∈W
(4)
Whether we minimize µ(w0 r, rb ) or µ(rb , w0 r) greatly depends on the interpretation of µ. This becomes particularly obvious if there is a deviation measure
behind µ. In Proposition 2, we considered the functional D(X) = µ(X, 0)
which appeared to be a deviation measure. If we switch the arguments of
µ, we obtain again a deviation measure as a consequence of the translation
invariance property,
e
D(X)
= µ(0, X) = µ(−X, 0) = D(−X).
Thus, whether we choose D(w0 r − rb ) or D(rb − w0 r) depends on the
particular form of D. For example, suppose that D is the semi-standard
12
deviation
σ+ (X) = (E(X − EX)2+ )1/2
where X+ = max(0, X). Thus µ(w0 r, rb ) = (E(w0 r0 − r0b )2+ )1/2 in which w0 r0
and r0b denote centered returns. Minimizing (E(w0 r0 − r0b )2+ )1/2 means that
we minimize the positive deviations of w0 r0 relative to r0b which is highly
undesirable because this way we minimize our relative profit. The proper
way to pose the problem is
min (E(r0b − w0 r0 )2+ )1/2
w∈W
and, apparently, the objective contains µ(rb , w0 r). The conclusion will be
exactly the opposite if
D(X) = σ− (X) = (E(X − EX)2− )1/2
where X− = min(0, −X). That is, in this case we should use µ(w0 r, rb ).
Apart from the semi-standard deviation, other examples of translation
invariant r.d. metrics can be generated from the value-at-risk (VaR) and the
conditional value-at-risk (CVaR),
Z
∞
CV aRα (X) =
tdGX (t)
−∞
where
13
GX (t) =


 0,
t < V aRα (X)

 (P (−X ≤ t) − (1 − α))/α, t ≥ V aRα (X)
in which V aRα (X) is defined as V aRα (X) = − inf{x ∈ R : P (X ≤ x) ≥ α}
and α is the tail probability. It can be shown that when considered on
X0 , both V aRα (X) and CV aRα (X) are deviation measures, see Rockafellar
et al. (2006). Therefore the functionals V aRα (X − Y ) and CV aRα (X − Y ),
X, Y ∈ X0 are translation invariant r.d. metrics.
4
Relation to probability metrics — representation theorems
In this section we examine the relationship between r.d. metrics and probability metrics (p. metrics). We have already defined the term probability metric
in Definition 1. The basic differences are that r.d. metrics are asymmetric
(P2 does not hold) and property P5. Therefore if we consider a p. metric
defined on X0 , the asymmetry remains the only difference. Any quasi-semimetric generates a p. semi-metric via symmetrization. If µ is a quasi-metric,
then µ(X, Y ) = (µ(X, Y )+µ(Y, X))/2 is symmetric and also satisfies P1 and
P3. Hence µ(X, Y ) is a p. metric. As a result, any r.d. metric generates a
p. metric via symmetrization. Due to the close connection, all classifications
of p. metrics remain valid for r.d. metrics.
In the theory of p. metrics, we distinguish between primary, simple,
and compound metrics (for more information about p. metrics, see Rachev
14
(1991)). We obtain the same classes of r.d. metrics by defining primary,
simple, and compound quasi-semi-metrics. In order to introduce primary p.
quasi-semi-metrics, we need some additional notation.
Let h be a mapping defined on X with values in RJ , that is we associate
a vector of numbers with a random variable. The vector of numbers could
be interpreted as a set of some characteristics of the random variable. An
example of such a mapping is: X → (EX, σ(X)) where the first element is
the mathematical expectation and the second is the standard deviation. Furthermore, the mapping h induces a partition of X into classes of equivalence.
That is, two random variables X and Y are regarded as equivalent, X ∼ Y ,
if their corresponding characteristics agree,
X∼Y
⇐⇒
h(X) = h(Y )
Since the p. metric is defined on the space of pairs of r.v.s LX2 , we have
to translate the equivalence into the case of pairs of r.v.s. Two sets of pairs
(X1 , Y1 ) and (X2 , Y2 ) are said to be equivalent if there is equivalence on an
element-by-element basis, i.e. h(X1 ) = h(X2 ) and h(Y1 ) = h(Y2 ).
Definition 5. Let µ be a probability quasi-semi-metric such that µ is
constant on the equivalence classes induced by the mapping h:
(X1 , Y1 ) ∼ (X2 , Y2 )
=⇒
µ(X1 , Y1 ) = µ(X2 , Y2 )
Then µ is called primary probability quasi-semi-metric. If the converse implication (⇐=) also holds, then µ is said to be a primary quasi-metric.
15
Here is an illustration. Assume that h maps the r.v.s to their first absolute
moments, h(X) = E|X|. Thus (X1 , Y1 ) ∼ (X2 , Y2 ) means that X1 and X2
on one hand, and Y1 and Y2 on the other, have equal first absolute moments,
i.e. (E|X1 |, E|Y1 |) = (E|X2 |, E|Y2 |). In this situation, a primary quasi-semimetric would measure the two distances, µ(X1 , Y1 ) and µ(X2 , Y2 ), as equal.
The word quasi reminds one that µ(X1 , Y1 ) may not equal µ(Y1 , X1 ). Nevertheless µ(Y1 , X1 ) = µ(Y2 , X2 ). Moreover, if h(X1 ) = h(Y1 ), then necessarily
µ(X1 , Y1 ) = 0. The word semi signifies that µ(X1 , Y1 ) = 0 may not induce
equality among the corresponding characteristics.
These considerations show that a primary quasi-semi-metric in X generates a quasi-semi-metric µ1 (h(X), h(Y )) = µ(X, Y ) in the space of the corresponding characteristics h(X) = {h(X) ∈ RJ , X ∈ X} ⊆ RJ (see Section
6.2 for examples).
Definition 6. A probability quasi-semi-metric is said to be simple if for
each (X, Y ) ∈ LX2 ,
P (X ≤ t) = P (Y ≤ t), ∀t ∈ R
=⇒
µ(X, Y ) = 0
If the converse implication (⇐=) also holds, then µ is said to be a simple
quasi-metric.
Definition 7. Any probability quasi-semi-metric in the sense of Definition 1 is compound.
as
In effect, µ(X, Y ) = 0 implies that (i) X = Y if µ is a compound quasimetric, (ii) FX (t) = FY (t), ∀t ∈ R if µ is a simple quasi-metric, and (iii)
h(X) = h(Y ) if µ is a primary quasi-metric. In the discussion of P1, we
16
noted that we would assume the strong property that µ(X, Y ) = 0 implies
as
X = Y . This may be too demanding. From a practical viewpoint, it may
be more appropriate to consider a r.d. metric such that µ(X, Y ) = 0 implies
only equality in distribution, or equality of some characteristics of X and Y ,
without the requirement that X and Y should coincide almost surely. For
such cases, simple and primary r.d. metrics are better positioned.
Under certain conditions, a p. metric can generate a r.d. metric.
Definition 8. A p. metric is called ideal of order p ∈ R if for any r.v.s
X, Y, Z ∈ X and any non-zero constant c the next two properties are satisfied
(a) Regularity (strong or weak): µ(X + Z, Y + Z) ≤ µ(X, Y )
(b) Homogeneity: µ(cX, cY ) ≤ |c|s µ(X, Y )
If µ is a simple metric, then Z is assumed to be independent of X and Y ,
that is, weak regularity holds. In this case, µ is said to be weakly-perfect of
order s.
Due to (a) and (b), any ideal metric defined on X0 turns into a symmetric
r.d. metric.
We can obtain a quasi-metric from a p. metric by breaking the symmetry
axiom while keeping the triangle inequality, thus ensuring that P2 holds in
addition to P1 and P3. Let us take as an example the average compound
metric
L(X, Y ) = (E(d(X, Y ))p )1/p
where d(x, y) is a metric in R and p ∈ [1, ∞). If d(x, y) = max(|x|, |y|) is
17
the Minkowski metric, then one way to break the symmetry in this case is
to consider
L∗p (X, Y ) = (E(max(X − Y, 0))p )1/p
(5)
A special limit case here is L∗∞ (X, Y ) = inf{ > 0 : P (max(X − Y, 0) > ) =
0} which is the quasi-metric corresponding to the limit case of the Ky-Fan p.
metric L∞ (X, Y ) = inf{ > 0 : P |X − Y | > ) = 0}. Still there is a deviation
measure behind L∗p (X, Y ) because it is translation invariant.
Proposition 4. The functional defined in equation (5) is a compound
quasi-semi-metric in X and a compound quasi-metric in X0 . It is translation
invariant and homogeneous of degree s = 1.
Proof. Let us first consider (5) in X. P1 is trivial. The triangle inequality
follows from the sub-additivity of the max function and Minkowski’s inequality.
Next we show that (5) is a quasi-metric in X0 . Suppose that L∗p (X, Y ) is
defined on X0 and, additionally, that L∗p (X, Y ) = 0. From the definition, we
easily observe that the last assumption implies Y ≥ X. Strict inequality is
easy to rule out because if Y > X, then EY > EX and this is impossible
as
by construction since EX = EY = 0. Therefore Y = X and hence L∗p is a
f holds.
quasi-metric in X0 , i.e. P1
Both translation invariance and the homogeneity degree are obvious.
As a second example, let us examine the Birnbaum-Orlicz compound
metric
18
Z
∞
1/p
(τ (t; X, Y )) dt
p
Θp (X, Y ) =
−∞
where p ≥ 1, τ (t; X, Y ) = P (X ≤ t < Y ) + P (Y ≤ t < X). In this case, it is
easier to obtain a quasi-metric. Consider the functional
Θ∗p (X, Y
Z
∞
)=
1/p
(τ (t; X, Y )) dt
∗
p
(6)
∞
where τ ∗ (t; X, Y ) = P (Y ≤ t < X) and p ≥ 1.
Proposition 5. The functional defined in equation (6) is a compound
quasi-semi-metric in X and a compound quasi-metric in X0 . It satisfies the
weak regularity property P4* and is homogeneous of degree s = 1/p.
Proof. In order to prove that (6) is a quasi-semi-metric in X, we need to
verify P1 and P3.
P1. It is trivial since P (X ≤ t < X) = 0 for all t ∈ R.
P3. We start by decomposing the function P (Y ≤ t < X), t is fixed.
P (Y ≤ t < X) = P ({Y ≤ t} ∩ {X > t})
= P ({Y ≤ t} ∩ {X > t} ∩ {{Z ≤ t} ∪ {Z > t}})
= P ({Y ≤ t} ∩ {X > t} ∩ {Z ≤ t})
+ P ({Y ≤ t} ∩ {X > t} ∩ {Z > t})
≤ P ({X > t} ∩ {Z ≤ t}) + P ({Y ≤ t} ∩ {Z > t})
= P (Z ≤ t < X) + P (Y ≤ t < Z)
19
The third equality holds because the corresponding events have empty intersection. The inequality appears because we ignore events which, generally,
have probability less than one. In effect, by Minkowski’s inequality,
Θ∗p (X, Y
∞
Z
)=
∞
∞
Z
≤
Z
1/p
(P (Y ≤ t < X)) dt
p
∞
∞
=
1/p
(P (Z ≤ t < X) + P (Y ≤ t < Z)) dt
p
1/p Z
(P (Z ≤ t < X)) dt
+
∞
p
∞
1/p
(P (Y ≤ t < Z)) dt
p
∞
= Θ∗p (X, Z) + Θ∗p (Z, Y )
and we receive the triangle inequality.
f holds in X0 . Similarly to Proposition 4,
It is simple to verify that P1
suppose that Θ∗p is defined on X0 . If Θ∗p (X, Y ) = 0, then τ ∗ (t; X, Y ) =
0, ∀t ∈ R. Since P (Y ≤ t < X) = 0 for any t implies Y ≥ X, by the same
argument, as Proposition 4, we obtain that Θ∗p is a quasi-metric in X0 and
a quasi-semi-metric in X. The weak regularity property P4* is checked by
applying Young’s convolution inequality. The homogeneity order is verified
directly by change of variables.
A special limit example of (5) is Θ∗∞ (X, Y ) = supt∈R P (Y ≤ t < X) which
is an asymmetric version of the compound metric Θ∞ (X, Y ) = supt∈R τ (t; X, Y )
generating as a minimal metric the celebrated Kolmogorov metric in the space
of distribution functions.
One might be tempted to surmise that a structured approach towards the
20
generation of classes of r.d. metrics is through asymmetrization of classes of
ideal p. metrics, to make them quasi -metrics, and using them on the subspace X0 . The functionals in (5) and (6) support such a generalization. For
instance, asymmetrization of (5) with the max function turns the p. metric
into a p. quasi-semi-metric on X and considering it on the sub-space X0
turns it into a quasi-metric. In spite of these two examples, in the general
case, such an approach would be incorrect because it would sometimes lead
to a quasi-semi-metric even on X0 . Let us look at Zolotarev’s ideal metric ζ2
defined as
Z
∞
ζ2 (X, Y ) =
−∞
Z
x
Z
x
FX (t)dt −
−∞
−∞
FY (t)dt dx.
where FX (t) = P (X ≤ t) is the cumulative distribution function (c.d.f.) of
X. An asymmetrization, according to the above scheme, is
ζ2∗ (X, Y
Z
∞
)=
Z
x
x
FY (t)dt −
max
−∞
Z
−∞
FX (t)dt, 0 dx.
(7)
−∞
d
Clearly, if X = Y , then ζ2∗ (X, Y ) = 0. The converse is not true, if ζ2∗ (X, Y ) =
0, then
Z
x
Z
x
FY (t)dt ≤
−∞
FX (t)dt,
∀x ∈ R,
−∞
even assuming X, Y ∈ X0 , which demonstrates that ζ2∗ (X, Y ) is a simple
quasi-semi-metric.
21
4.1
Minimal quasi-semi-metrics
The minimal quasi-metric is a type of simple quasi-metric obtained from a
compound quasi-metric in a special way. Thus certain properties valid for the
compound quasi-metric are inherited by the generated minimal quasi-metric.
We will assume that the following continuity property holds.
CP. Assume that the pairs (Xn , Yn ) ∈ LX2 , n ∈ N converge in distribution
to (X, Y ) ∈ LX2 and that µ(Xn , Yn ) → 0 as n → ∞. Then µ(X, Y ) =
0.
The property CP is not very restrictive; all examples of µ that we will consider
have it. We state it for technical reasons.
Lemma 1. Suppose that µ is a compound quasi-semi-metric. Then µ
b
defined as
d
d
e Ye ) : X
e = X, Ye = Y }
µ
b(X, Y ) = inf{µ(X,
(8)
d
where = means equality in distribution, is a simple quasi-semi-metric. Moreover, if µ is a compound quasi-metric satisfying the CP condition, then µ
b is
a simple quasi-metric. Finally, if µ satisfies any of the inequalities
µ(αX + (1 − α)Y, Z) ≤ αµ(X, Z) + (1 − α)µ(Y, Z)
µ(X, αY + (1 − α)Z) ≤ αµ(X, Y ) + (1 − α)µ(X, Z)
then so does µ
b on condition that the bivariate law in the convex combination,
either (X, Y ) or (Y, Z), is known.
22
Proof. We make use of the ideas behind the proof that the minimal functional
(8) is a p. semi-metric if µ is a semi-metric, see Rachev (1991). Repeating
f P3 hold for µ
the arguments there we see that P1, (P1),
b.
The verification of convexity is very similar to checking P3. We prove
that µ
b is convex in the first argument; the same reasoning can be used to
prove the other inequality. Let X1 , X2 , Z ∈ X0 and α ∈ [0, 1] and assume
that we know the bivariate law (X1 , X2 ); this way we know the r.v. Xα :=
αX1 + (1 − α)X2 , Xα ∈ X0 . We also assume the technical condition that the
underlying probability space (Ω, A, P ) is rich enough. Hence, for any > 0,
d
e1 , Z),
e (X
e2 , Z)
e and (X
eα , Z)
e such that X
e1 = X1 ,
we can choose bivariate laws (X
d
d
as
d
e2 =
eα =
e1 + (1 − α)X
e2 =
e1 , Z),
e
X
X2 , Ze = Z, X
αX
Xα , µ
b(X1 , Z) + ≥ µ(X
e2 , Z).
e By assumption µ is convex and therefore
and µ
b(X2 , Z) + ≥ µ(X
eα , Z)
e ≤ αµ(X
e1 , Z)
e + (1 − α)µ(X
e2 , Z)
e holds. Hence
µ(X
eα , Z)
e ≤ αµ(X
e1 , Z)
e + (1 − α)µ(X
e2 , Z)
e
µ
b(Xα , Z) ≤ µ(X
≤ α(b
µ(X1 , Z) + ) + (1 − α)(b
µ(X2 , Z) + )
= αb
µ(X1 , Z) + (1 − α)b
µ(X2 , Z) + Since we minimize over all possible ways to couple Xα and Z, we implicitly
assume that the distribution of Xα is known and we hold it fixed in this
calculation.
The chain of inequalities above is true for any > 0, therefore letting
→ 0 we prove the convexity of µ
b.
23
The additional condition, that we have to know the bivariate law in the
convex combination, is not restrictive. If we interpret X1 as the return of a
stock and X1 as the return of another stock, then Xα := αX1 +(1−α)X2 , α ∈
[0, 1] is the return of the portfolio composed of the two stocks. If we do not
know the bivariate law X1 , X2 , then we cannot compute the distribution of
the portfolio return.
The result contained in Lemma 1 will be used to derive classes of r.d.
metrics consistent with the notion of convergence in distribution. First we
have to check whether it is true that if µ is a r.d. metric, then the minimal
quasi-metric is also a r.d. metric.
Corollary 1. If µ is a compound r.d. metric, then µ
b defined in (8) is a
simple r.d. metric.
Proof. By assumption, µ satisfies P1, P3, P4 or P4*, P5, P6. From Lemma 1,
we know that µ
b satisfies P1 and P3, provided that the continuity condition
holds. It remains to check P4-P6. P4 or P4* follows from the argument
which we used to prove convexity, P5 holds trivially since µ (and therefore
µ
b) is defined in X0 . P6 is easy to check.
d
d
e Ye ) : X
e=
µ
b(aX, aY ) = inf{µ(X,
aX, Ye = aY }
d
d
e
e =
= inf{as µ(X/a,
Ye /a) : X/a
X, Ye /a = Y }
d
d
e
e =
= as inf{µ(X/a,
Ye /a) : X/a
X, Ye /a = Y }
= as µ
b(X, Y )
24
We observe the following relationships between the introduced classes of
r.d. metrics:
a) compound translation invariant r.d. metrics ⊂ compound convex r.d.
metrics ⊂ compound r.d. metrics ⊂ compound quasi-metrics ⊂ compound quasi-semi-metrics.
b) simple convex r.d. metrics ⊂ compound convex r.d. metrics
c) simple translation invariant r.d. metrics 6⊂ compound translation invariant r.d. metrics
d) ideal p. metrics ⊂ r.d. metrics
e) primary quasi-semi-metrics ⊂ simple quasi-semi-metrics ⊂ compound
quasi-semi-metrics
f may fail to hold for the minimal metItem c) appears because generally P4
ric. Item e) is due to the general relationship between the corresponding p.
metrics.
Apart from providing an interesting link between the classes of compound
and simple quasi-semi-metrics, the minimal metrics can be used to construct
simple quasi-semi-metrics with suitable properties. For example, it is easier
to establish the regularity property, or the convexity property, for a compound metric using the method of one probability space. These properties
are inherited by the corresponding simple minimal metric. In this fashion,
taking advantage of Corollary 1, we can construct simple r.d. metrics.
25
Under some conditions, µ
b can be explicitly computed. We can obtain
explicit representations through the Cambanis-Simons-Stout theorem, see
Cambanis et al. (1976). The basic results are contained in the next theorem.
Theorem 1. Given X, Y ∈ X with finite moment
R
R
φ(x, a)dF (x), a ∈ R
where φ(x, y) is a quasi-antitone function, i.e.
φ(x, y) + φ(x0 , y 0 ) ≤ φ(x0 , y) + φ(x, y 0 )
for any x0 > x and y 0 > y, then
Z
µ
bφ (X, Y ) =
1
φ(FX−1 (t), FY−1 (t))dt
0
where FX−1 (t) = inf{x : FX (x) ≥ t} is the generalized inverse of the c.d.f.
FX (x) and also µ
bφ (X, Y ) = µφ (FX−1 (U ), FY−1 (U )) where U is a uniformly
distributed r.v. on (0, 1).
By virtue of Corollary 1, µ
bφ is a r.d. metric if µφ is a r.d. metric and
clearly it depends only on the distribution functions of X and Y . The function φ should be such that φ(x, x) = 0 but generally it may not be symmetric,
φ(x, y) 6= φ(y, x). Examples of φ include f (x − y) where f is a non-negative
convex function in R. In particular, one might choose
φ(x, y) = H1 (max(x − y, 0)) + H2 (max(y − x, 0))
where H1 , H2 : [0, ∞) → [0, ∞) are convex, non-decreasing functions. If
H1 (t) = t and H2 (t) = 0, φ∗ (x, y) = max(x − y, 0), then
26
Z
1
max(FX−1 (t) − FY−1 (t), 0)dt
µφ∗ (X, Y ) =
(9)
0
We can see that (9) is the minimal quasi-semi-metric of (5). If (9) is defined on the space of zero-mean random variables, X, Y ∈ X0 , then µφ∗ (X, Y )
is a quasi-metric and, therefore, according to Corollary 1, (9) is a r.d. metric,
see Proposition 4. Without this restriction, if X, Y ∈ X, then µφ∗ (X, Y ) is
a quasi-semi-metric.
Similarly, we obtain the minimal metrics of L∗p defined in (5),
lp∗ (X, Y
) = Lb∗p (X, Y ) =
1/p
1
Z
(max(FX−1 (t)
−
FY−1 (t), 0))p dt
.
0
The equation in (9) is just l1∗ (X, Y ). Furthermore, Proposition 4 shows that
L∗p is a translation invariant r.d. metric and therefore it is convex. As a
result, according to Lemma 1, lp∗ is a simple convex r.d. metric.
Other explicit forms can be computed also for the family Θ∗p defined in
(6) through the Frechet-Hoeffding inequality.
Lemma 2. Suppose that X, Y ∈ X and H : [0, ∞] → [0, ∞] is nondecreasing. If
Θ∗H (X, Y
Z
∞
H(P (Y ≤ t < X))dt
)=
−∞
then
b ∗ (X, Y ) =
Θ
H
Z
∞
H(max(FY (t) − FX (t), 0))dt
−∞
27
Proof. The demonstration is straightforward.
Z
∞
H(P (Y ≤ t < X))dt
µ(X, Y ) =
Z−∞
∞
H(P (Y ≤ t) − P (Y ≤ t, X ≤ t))dt
=
Z−∞
∞
≥
H(FY (t) − min(FX (t), FY (t)))dt
Z−∞
∞
H(max(FY (t) − FX (t), 0))dt
=
−∞
The inequality is application of the celebrated Frechet-Hoeffding upper bound
min(FX (x), FY (y)) ≥ P (X ≤ x, Y ≤ y). See Rachev (1991) p. 153, 154 for
more general results.
Corollary 2. Choosing H = tp , p ≥ 1, we receive
θp∗ (X, Y
)=
b ∗p (X, Y
Θ
Z
∞
1/p
(max(FY (t) − FX (t), 0)) dt
p
)=
−∞
∗
b ∗∞ (X, Y ) = supt∈R [max(FY (t) − FX (t), 0)].
and θ∞
(X, Y ) = Θ
With respect to the homogeneity property, the examples (5), (6) and their
minimal counterparts show a wide range of degrees. The family L∗p and the
minimal metrics lp∗ it generates are homogeneous of degree one. In contrast,
the families Θ∗p and θp∗ are homogeneous of degree s = 1/p. At the limit, Θ∗∞
∗
∗
and θ∞
are both homogeneous of degree zero. The functional θ∞
turns into
the Kolmogorov p. metric
ρ(X, Y ) = sup |FX (t) − FY (t)|.
t∈R
28
having replaced the max function with absolute value. The representatives
with s = 1 are Θ∗1 and θ1∗ . The functional θ1∗ turns into the Kantorovich p.
metric
Z
∞
|FX (t) − FY (t)|dt.
κ(X, Y ) =
−∞
in the same fashion.
5
Dual representations
At the end of Section 4, we gave an example illustrating that sometimes via
asymmetrization one may obtain a quasi-semi-metric. A natural question
arises. Is it wrong to consider problem (1) with a quasi-semi-metric instead
of a quasi-metric in the objective? A quasi-metric guarantees that if there
is a feasible portfolio w such that µ(w0 r, rb ) = 0 then it is a solution to the
as
d
problem and either w0 r = rb or w0 r = rb or h(w0 r) = h(rb ) depending on
the type of the quasi-metric, compound, simple, or primary (see Section 4
for more details) where h(w0 r) denotes a countable set of portfolio-return
characteristics. This property was stated as a basic rationale behind the suggestion to use a quasi-metric. After all, if the feasible set contains portfolios
as
d
with positive expected excess return, then any of w0 r = rb or w0 r = rb means
that we have beaten the the benchmark.
What changes in this picture if we replace the r.d. metric by a r.d. semimetric? Understandably, we cannot rely any longer on any type of equality
between w0 r and r in contrast to the case of a r.d. metric. Nevertheless,
29
we obtain a notion of dominance if µ(w0 r, rb ) = 0 which we will call the
implied pre-order of µ and this is sufficient to justify the application of a r.d.
semi-metric. For instance, at the end of Section 4 we considered ζ2∗ which,
apparently, implies second-order stochastic dominance (SSD) regardless of
whether we consider X or X0 . Other examples of implied pre-orders can
be constructed by looking at lp∗ and θp∗ when defined on X — they all imply
first-order stochastic dominance (FSD). The corresponding compound quasimetrics L∗p and Θ∗p imply the same pre-order because L∗p (X, Y ) = 0 and
Θ∗p (X, Y ) = 0 imply X ≤ Y almost surely, which is equivalent to FX (t) ≤
FY (t), ∀t and, therefore, FSD. Naturally, there are many quasi-semi-metrics
implying one and the same pre-order.
Definition 9. We call a binary relation µ implied by the quasi-semimetric µ if X µ Y is equivalent to µ(X, Y ) = 0, that is X µ Y if and only
if µ(X, Y ) = 0
It is easy to notice that the binary relation µ as defined is indeed a
pre-order, i.e. it is reflexive (X µ X) and transitive (X µ Y and Y µ
Z =⇒ X µ Z). Reflexivity follows from P1, µ(X, X) = 0. Transitivity is a
consequence of the triangle inequality, 0 ≤ µ(X, Z) ≤ µ(X, Y ) + µ(Y, Z) = 0.
We can associate a binary relation of equivalence ∼µ to a given pre-order,
X ∼µ Y ⇐⇒ X µ Y and Y µ X. For example, if µ is symmetric, then we
have an equivalence relation. This is because firstly µ(X, Y ) = 0 =⇒ X µ Y
from the definition above and secondly µ(X, Y ) = µ(Y, X) =⇒ Y µ X from
the symmetry and the definition above. Combining both we get X ∼µ Y if
µ(X, Y ) = 0. Beside semi-metrics, equivalence relation is implied by quasi-
30
metrics. Actually, the necessary and sufficient condition for the equivalence
relation to hold is µ(X, Y ) = µ(Y, X) = 0. For a simple quasi-metric, ∼µ is
actually equality in distribution, and for a compound quasi-metric — equality
in almost sure sense.
Suppose that we would like to solve the benchmark-tracking problem (1)
with a given quasi-semi-metric µ. A feasible portfolio such that µ(w0 r, rb ) = 0
is truly a solution and, as we have mentioned, the portfolio w dominates the
benchmark rb in the sense suggested by µ. Intuitively, the functional µ
metrizes the distance to the desirable set of all portfolios, which are no worse
than the benchmark in which worse is in a µ-sense. A logical next step in this
analysis is to answer the question if it is possible to characterize the set of
these portfolios in terms of a global preference of a class of investors; that is,
if µ(w0 r, rb ) = 0, then all investors in a given class prefer w to the benchmark.
If a representation of this sort exists, we call it a dual representation or form
of µ. (Ortobelli et al. (2006) develop a detailed theory about consistencies
between probability functionals and preference orders.)
To begin with, suppose that there is a set of investors identified by their
utility function u ∈ U who choose between two portfolios with return X and
Y respectively. We assume the utility functions are defined on the return
rather than the wealth. A given investor prefers Y to X, X u Y , if Eu(X) ≤
Eu(Y ), u ∈ U. Define the set of all pairs (X, Y ) such that Y is preferred to
X for a fixed investor,
gr(u) := {(X, Y ) : Eu(X) ≤ Eu(Y )}
31
The portfolio with return Y is preferred by all investors if the pair (X, Y )
belongs to
gr(U) =
\
gr(u) = {(X, Y ) : Eu(X) ≤ Eu(Y ), for all u ∈ U}
u∈U
All investors are indifferent to X or Y if (X, Y ) ∈ gr(U) and (Y, X) ∈ gr(U)
and the two portfolios are indistinguishable in this sense. Therefore the set
gr(U) is non-empty because the pair (X, X) satisfies all conditions.
We define the following functional


 0,
ν(X, Y ) =

 sup
(X, Y ) ∈ gr(U)
u∈U b [Eu(X)
− Eu(Y )], (X, Y ) 6∈ gr(U)
(10)
= sup [max(Eu(X) − Eu(Y ), 0)]
u∈U b
where U b ⊆ U and is such that if ν(X, Y ) = 0, then (X, Y ) ∈ gr(U). Therefore X and Y are indistinguishable if ν(X, Y ) = 0 and ν(Y, X) = 0, i.e.
this the implied ν-equivalence. If there is at least one investor who does not
prefer Y to X, then ν(X, Y ) > 0. The reason for introducing the subclass
U b is to avoid the explosion of ν(X, Y ). For example, if U is a closed, convex
cone or a closed, convex set, then U b might be the smallest set of functions
generating U via limits of convex combinations and in this case U b might be
regarded as some set of ”corner” elements of U. Also, U b should not contain
”equivalent” functions as far as the preference relation (with X and Y fixed)
32
is concerned, i.e. u and v are considered equivalent if X u Y if and only
if X v Y . A simple example is {u(x) : u(x) = av(x), a > 0} which is
obviously equivalent to v(x). Thus there is some minimal set for which the
quasi-semi-metric ν(X, Y ) is consistent with the set gr(U) — if ν(Y, X) = 0,
fb ⊇ U b , the corresponding metthen (X, Y ) ∈ gr(U). For any other set U
ric νUfb (X, Y ) ≥ νU b (X, Y ), where the index denotes the set with respect to
which the supremum in (10) is calculated.
Proposition 6. The functional ν(X, Y ) defined in (10) is a simple quasisemi-metric in X.
d
Proof. If X = Y , then ν(X, Y ) = 0. ν(X, Y ) ≥ 0 is trivial. The triangle
inequality is simple to show using the representation with the max function.
The natural symmetrization of ν(X, Y ) is defined as
ν(X, Y ) = sup |Eu(X) − Eu(Y )|
(11)
u∈U b
The functional (11) is apparently symmetric by definition but this is still not
sufficient for ν(X, Y ) to be a metric in the space of distribution functions.
There should be an additional condition on the class U b — it should be ”rich
enough” in order for ν(Y, X) to be a simple metric. We will compute ν(X, Y )
for some classes U b in Section 6.2.
The above proposition states that ν(X, Y ) is a simple quasi-semi-metric.
Properties P4 and P6 are not possible to verify unless we assume a particular
class U b . In addition, for some U b the functional ν(X, Y ) may turn into a
33
semi-metric as it will satisfy the symmetry property.
In the dual form that we conjecture here, the random variables might
belong to X or X0 . For instance in the Rothschild-Stiglitz preference order, the relation in the space of distribution functions is EX = EY and
Rx
Rx
F
(t)dt
≤
F (t)dt, ∀x ∈ R, if X is preferred to Y . Therefore we
X
−∞
−∞ Y
can consider the centered variables X and Y in X0 and leave the integral
inequality.
6
Applications
In this section, we give a number of applications of the developed theory.
6.1
Compound vs simple r.d. metrics
Let us return to the minimal tracking error problem in order to illustrate
the difference between compound and simple r.d. metrics. The optimization
problem can be re-stated in terms of the Lp metric
Lp (X, Y ) = (E|X − Y |p )1/p , p ≥ 1
since, by definition, σ(X − Y ) = L2 (X, Y ) where L2 is defined in X0 . Therefore we can consider the equivalent optimization problem
min L2 (X, Y )
X∈X0
(12)
where X0 contains the centered r.v.s from X. Let us impose the additional
assumption that X0 is the space of all random variables in X0 independent
34
of Y ; that is, in (12) we fix the joint dependence of (X, Y ) and vary only the
distribution of X. We know that
(L2 (X, Y ))2 = E(X − Y )2
= EX 2 − 2E(XY ) + EY 2
= EX 2 + EY 2 ,
X, Y ∈ X0
because E(XY ) = EXEY = 0 due to the independence assumption. As a
result, the problem
min EX 2
X∈X0
is equivalent to (12) because EY 2 does not depend on X. The obvious
a.s.
solution is X ∗ = 0, which is the only constant in X0 .
If the elements of X0 are interpreted as the centered returns of some
feasible portfolios, then X ∗ means that we have to invest everything in the
risk-free asset. Apparently this is not in line with the intuition that the
solution should be “similar” to the benchmark with centered return Y .
Now let us replace L2 by the minimal metric l2 (X, Y ) = Lb2 (X, Y ). The
explicit representation, due to Theorem 1, is
Z
l2 (X, Y ) =
1/2
1
(FX−1 (t)
0
and the problem becomes
35
−
FY−1 (t))2 dt
min l2 (X, Y )
X∈X0
(13)
If X ∗ is an independent copy of Y , then l2 (X, Y ) = 0 and it follows that X ∗
is a solution to (13). This is in agreement with what we expected.
However simplistic this example may be, it is a warning that improper
dependence assumptions may lead to degenerate results.
6.2
Simple examples of dual representations
In this sub-section we give examples of some quasi-semi-metrics computed
through the suggested dual form in (10).
6.2.1
Quadratic utility functions, X, Y ∈ X0
Let us consider the class of quadratic utility functions Ua+ = {u(x) : u(x) =
ax2 + bx, a, b ≥ 0}. The plus sign in the index emphasizes the assumption
that a ≥ 0. We will compute ν(X, Y ) for X, Y ∈ X0 . If we set U b = Ua+ , we
obtain
νa+ (X, Y ) := sup (max(a(EX 2 − EY 2 , 0)))
u∈U +
 a

 0, EX 2 ≤ EY 2
=

 ∞, EX 2 > EY 2
It can easily be seen that if EY 2 > EX 2 , then Eu(Y ) > Eu(X) for all
u ∈ Ua+ because EX = EY = 0 due to the assumption Y, X ∈ X0 . Therefore
36
we can choose only one representative for the set U b , for example u(x) = x2
and we obtain
νa+ (Y, X) = max(EX 2 − EY 2 , 0)
Indeed, in this special case νa+ (Y, X) is not a quasi-metric in the space of
distribution functions and this is because the class Ua+ is not rich enough.
It is a primary quasi-semi-metric and implies the following pre-order: Y is
preferred to X by all investors, X νa+ Y , if and only if νa+ (Y, X) = 0 which
is equivalent to EX 2 ≤ EY 2 . All investors prefer the r.v. with larger second
moment because all utility functions are convex due to the imposed condition
a ≥ 0.
Now consider Ua− = {u(x) : u(x) = ax2 + bx, b ≥ 0, a < 0}. Then
νa− (X, Y ) = supu∈Ua− (max(|a|(EY 2 − EX 2 ), 0)) and the same reasoning as
above leads to the conclusion that
νa− (X, Y ) = max(EY 2 − EX 2 , 0).
Therefore in this case Y is preferred to X by all investors, X νa− Y , if and
only if EY 2 ≤ EX 2 , that is all investors prefer the random variable with
smaller second moment which is understandable as their utility functions are
concave and investors are risk-averse. These very simple results are in line
with the intuition and classical theory. The construction of the quasi-semimetric is more or less trivial.
We have noted that the set gr(U) may be more or less “poor” and this depends on the class U, which governs the properties of ν(X, Y ). For instance,
37
if U = Ua+
S
Ua− , then we get
ν(X, Y ) = max(νa+ (X, Y ), νa− (X, Y ))
= max(EX 2 − EY 2 , EY 2 − EX 2 )
= |EX 2 − EY 2 |
and the set gr(U) contains only those random variables the second moment
of which matches EX 2 , i.e. ν(X, Y ) is a primary p. metric. In this case the
symmetry of ν(X, Y ) appears due to symmetrization of the class U b .
It is easy to check that all examples in this sub-section are weakly regular.
Moreover, P6 holds with s = 2 for all of them and therefore νa+ , νa− and ν
are r.d. metrics.
6.2.2
Quadratic utility functions, X, Y ∈ X
All results in the previous section can be received by a more formal treatment.
It is easy to notice that the set Ua+ is generated by the functions u1 (x) =
x2 and u2 (x) = x having removed the equivalent functions (equivalent up
to a positive multiplier). The function u2 can be disregarded because all
considerations were in X0 . Extending the reasoning to X, we have to bear
in mind both u1 and u2 . Computing the supremum over only the generating
set of two functions, we get
νa+ (X, Y ) = max(EX 2 − EY 2 , EX − EY, 0)
The same reasoning, when applied to Ua− , leads to
38
νa− (X, Y ) = max(EY 2 − EX 2 , EX − EY, 0)
since in this case U b = {−x2 , x}. Both νa+ (X, Y ) and νa− (X, Y ) are primary
quasi-semi-metrics. Combining them, that is computing the supremum over
{x2 , −x2 , x}, we obtain
ν(X, Y ) = max(|EX 2 − EY 2 |, EX − EY )
This is not a metric yet; rather, it is a quasi-metric because we have required
b > 0. Extending the calculation over all quadratic functions, we obtain
ν(X, Y ) = max(|EX 2 − EY 2 |, |EX − EY |)
which is the Minkowski metric in R2 where the axes are the first and the
second moment, respectively. It is in this case that the functional turns into
a primary p. metric. Note that all examples in these sub-sections are well
defined if the considered random variables have a finite second moment.
6.2.3
The class of non-satiable investors
The utility functions of the class of non-satiable investors are non-decreasing.
The minimal set of functions U b generating the class of non-decreasing functions is U b = {u(x) : u(x) = I{x≥a} , a ∈ R} where I{x≥a} is the indicator
function of the set {x ≥ a} with a being fixed. For more details, see Brumelle
and Vickson (1975). An application of the definition in (10) yields
39
ν(X, Y ) := sup (max(Eu(X) − Eu(Y ), 0))
u∈U b
Z
Z
= sup max
I{x≥a} dFX (x) − I{x≥a} dFY (x), 0
a∈R
R
R
(14)
= sup max(P (X ≥ a) − P (Y ≥ a))
a∈R
= sup max(P (Y < a) − P (X < a), 0)
a∈R
This functional is a quasi-semi-metric and we easily notice that ν(X, Y ) =
∗
∗
θ∞
(X, Y ) when X, Y have continuous c.d.f.s. The functional θ∞
(X, Y ) is
discussed in Corollary 2. The same reasoning applied to the natural symmetrization (11) leads to the celebrated Kolmogorov metric
ν(X, Y ) = ρ(X, Y ) = sup |FX (a) − FY (a)|
a
∗
Therefore, as we have already noted in Section 4.1, ν(X, Y ) = θ∞
(X, Y ) can
be regarded as an asymmetric version of the Kolmogorov metric. Eventually,
∗
we have obtained two representations of θ∞
— one as a minimal metric of
Θ∗∞ and another through the dual form in (14).
Repeating the arguments in Proposition 5 and Lemma 2 for the functional
Θ?p (X, Y
Z
∞
)=
1/p
(P (Y < t ≤ X)) dt
p
−∞
we can see that Θ?p satisfies the same properties as Θ∗p and they both coincide
?
b ?∞ (X, Y ) =
when X and Y have continuous c.d.f.s. Moreover θ∞
(X, Y ) := Θ
ν(X, Y ) and thus ν(X, Y ) is weakly regular and positive homogeneous of
40
degree zero. Therefore ν(X, Y ) as given in (14) is a r.d. metric.
6.2.4
The class of non-satiable, risk-averse investors
The utility functions of the class of all non-satiable, risk-averse investors
are increasing and concave. In this case, the minimal set of functions U b
generating them is U b = {u(x) : u(x) = min(x, a), a ∈ R}, for more details,
see Brumelle and Vickson (1975). Applying the definition, we get
ν(X, Y ) := sup (max(Eu(X) − Eu(Y ), 0))
u∈U b
Z
Z
= sup max
min(x, a)dFX (x) − min(x, a)dFY (x), 0
a∈R
R
R
Z a
Z a
= sup max
FY (x)dx −
FX (x)dx, 0
a∈R
−∞
(15)
−∞
The last equality holds because
Z
a
xdFX (x) + a(1 − FX (a))
Z a
=a−
FX (x)dx
E min(X, a) =
−∞
−∞
after integration by parts and assuming that X has a finite first absolute
moment. This technical condition is very natural from a practical viewpoint:
if the random variables have E|X| = ∞, then the fundamental notion of
expected return breaks down.
The functional in (15) is a quasi-semi-metric in X0 and is closely related
to Zolotarev’s p. metric.
41
6.3
Solving the optimization problems in practice
Suppose that we have scenarios rk ∈ Rn , k = 1, N for the stock returns
in the portfolio and also scenarios rkb , k = 1, N for the benchmark returns.
Also suppose that rk and rkb are modeled jointly, that is they are in one and
the same state of the world. They could also be historical observations; we
observe the benchmark and the stock returns in a given period and collect
the observations.
Problem (1) has the following form when µ(X, Y ) = L∗1 (X, Y ) = E max(X−
Y, 0),
min E max(rb − w0 r, 0)
w∈W
This is a convex optimization problem if W is convex because L∗1 is translation
invariant. The objective function can be approximated with the available
scenarios and the new problem is
N
1 X
min
max(rkb − w0 rk , 0)
w∈W N
k=1
(16)
and this problem can be replaced by an equivalent linear problem by the
standard approach,
42
min
w∈W
N
1 X
dk
N k=1
s.t.
rkb − w0 rk ≤ dk
dk ≥ 0
Now let us state problem (1) when the underlying r.d. metric is the
corresponding minimal metric l1∗ (X, Y ) given in equation (9),
Z
min
w∈W
0
1
−1
max(Fr−1
b (t) − Fw 0 r (t), 0)dt
This problem is convex if W is convex because of the properties of l1∗ (X, Y ),
see Lemma 1, and can also be approximated with the available scenarios by
considering the empirical versions of the generalized inverse of the c.d.f.s.
Then the integral can be easily calculated because the integrands are step
functions and it turns into a sum. The simplified problem is
min
w∈W
N
1 X
b
max(r(k)
− (w0 r)(k) , 0)
N k=1
(17)
b
b
where (w0 r)(1) ≤ (w0 r)(2) ≤ . . . ≤ (w0 r)(N ) and similarly r(1)
≤ r(2)
≤ ... ≤
b
r(N
) are the ordered observations of the portfolio returns and the benchmark
returns.
The difference between problems (16) and (17) is that in the second problem the observations are sorted. This is not surprising because the minimal
r.d. metric takes into account only the distance between the distribution
43
functions. The dependence between w0 r and rb is destroyed by the sorting.
The max function in the objective can be linearized but due to the sorting,
problem (17) is more difficult than (16).
For specific choices of r.d. metrics, particular linearized problems can be
constructed, as in the case of the CVaR (see Rockafellar and Uryasev (2002)).
For other examples of linearized problems, see Ortobelli et al. (2006).
7
Conclusion
This paper discusses the connections between probability metrics theory and
benchmark tracking-error problems. We define a new class of functionals,
metrizing the relative deviation of a portfolio to a benchmark. Firstly, we
observe that the class of deviation measures introduced by Rockafellar et al.
(2006) is generated by the class of translation invariant r.d. metrics. Secondly, we divide all r.d. metrics into three categories — primary, simple
and compound functionals and introduce minimal r.d. metrics. The three
classes of r.d. metrics assign different properties to the optimal portfolio
problem and the minimal functional can be used in a constructive way to
obtain simple r.d. metrics. The classification and the methods are inspired
by the theory of probability metrics. Further on, we show that under certain
conditions, the minimal quasi-semi-metric admits an integral representation.
Finally, we analyze the implied pre-orders of quasi-semi-metrics, introduce a
dual form consistent with a global preference relation and describe possible
applications to portfolio selection problems.
Even though in the paper we consider a static problem, the generality of
44
the suggested approach allows for extensions in a dynamic setting by studying
quasi-semi-metrics not in the space of random variables but in the space of
random processes.
References
Artzner, P., F. Delbaen, J.-M. Eber and D. Heath (1998), ‘Coherent measures
of risk’, Math. Fin. 6, 203–228.
Brumelle, D. L. and R. G. Vickson (1975), ‘A unified approach to stochastic
dominance’, in Stochastic Optimization Models in Finance, Ziemba and
Vickson (eds) pp. 101–113.
Cambanis, S., G. Simons and W. Stout (1976), ‘Inequalities for ek(x,y) when
the marginals are fixed’, Z. Wahrsch. Verw. Geb. 36, 285–294.
Föllmer, H. and A. Schied (2002), ‘Convex measures of risk and trading
constraints’, Finance and Stochastics pp. 429–447.
Frittelli, M. and E. Rosazza Gianin (2002), ‘Putting order in risk measures’,
Journal of Banking and Finance 26, 1473–1486.
Ortobelli, L. S., S. Rachev, H. Shalit and F. Fabozzi (2006), ‘Risk probability
functionals and probability metrics applied to portfolio theory’, Working
paper, Department of Probability and Applied Statistics, University of California, Santa Barbara, USA .
Rachev, S. T. (1991), Probability Metrics and the Stability of Stochastic Models, Wiley, Chichester, U.K.
45
Rockafellar, R. T. and S. Uryasev (2002), ‘Conditional value-at-risk for general loss distributions’, Journal of Banking and Finance 26, (7), 1443–
1471.
Rockafellar, R. T., S. Uryasev and M. Zabarankin (2006), ‘Generalized deviations in risk analysis’, Finance and Stochastics 10, 51–74.
Szegö, G. (2004), Risk measures for the 21st century, Wiley & Son Chichester.
46