Download Summary of the papers on ”Increasing risk” by Rothschild and Stiglitz

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Renormalization group wikipedia , lookup

Enterprise risk management wikipedia , lookup

Mathematical optimization wikipedia , lookup

Hardware random number generator wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Financial economics wikipedia , lookup

Dirac delta function wikipedia , lookup

Randomness wikipedia , lookup

Generalized linear model wikipedia , lookup

Expected utility hypothesis wikipedia , lookup

Transcript
Summary of the papers on ”Increasing risk”
by Rothschild and Stiglitz
Seminar work
Pavol Majher
1169390
Economic Literature Seminar
Lecturer: prof. Manfred Nermuth
November 2011
1
Introduction
A comparison of the variability (or “riskiness”) of different random variables has been
of a particular interest of many economists. Over past decades, several approaches to
address this problem have been developed. One of the most essential works in this field is
series of articles written by Rothschild and Stiglitz (1970, 1971, 1972), which have become
widely recognized over time.
An aim of this seminar work is to overview theory and results, which are presented in
these papers. Our structure follows the layout of the original articles. We start with
an introduction of four concepts to compare variability and provide a deeper theoretical
background for one of them. Consequently we move on to the definition of three different
partial orderings, which are related to these different approaches to the risk comparison.
Furthermore, as a main result of the paper Rothschild and Stiglitz (1970), we show the
proof of their mutual equivalence. Moreover, several remarks are spend on the comparison to the approach of mean-variance analysis as well as to the references mentioned in
Rothschild and Stiglitz (1972).
In the second part of our work, we present the second paper Rothschild and Stiglitz (1971),
which focuses on the economic applications on the derived framework. An economic examples such that savings and uncertainty, a portfolio problem or firm’s production problem
are overviewed. In each case, the focus is on the impact, which the higher rate of uncertainty has on the decision making process.
2
Theoretical Background
2.1
Different Concepts of Risk Comparison
We start with an informal introduction of different approaches to the risk comparison,
which are formalized later in section 3.1. As there is an ambition to set up the criteria to
decide when is the random variable Y more “variable” than different random variable X,
four possible answers are listed in Rothschild and Stiglitz (1970):
1. Y is equal to X plus noise
It is reasonable that the random variable created from the original by adding some
uncorrelated noise should be riskier than the former. To illustrate this concept, let’s
2
take X a lottery ticket that pays ai with probability pi (such that
∑
pi = 1). Then
Y could be considered as a lottery ticket, which pays bi with probability pi such
that bi is either ai or lottery ticket with expected value ai .
2. Every risk averter prefers X to Y
A risk averter is defined as a one having a concave utility function. Thus X can
be viewed as less risky than Y if EU (X) ≥ EU (Y ) (given that X and Y have the
same mean).
3. X has less weight in the tails than Y
For random variables X and Y with density functions f and g, it seems adequate
to present X as less variable if some probability weight has been shifted in f from
the center towards the tails to obtain g in way that the mean remained the same.
4. Y has a greater variance than X
Comparison of variances of two random variables is commonly used tool to compare
their riskiness.
As it is presented later (particularly part 3.2), the first three concepts are mutually equivalent definitions of greater riskiness, while the last one provides quite different apporach.
More on this “difference” is presented in part 3.3.
Before moving forward, we have to introduce notation as used in the original paper. From
now on, X and Y denote random variables with cumulative distribution functions (cdf’s)
F and G and (in case they exist) densities f and g. At the time of paper’s publication the
results applied only at the cdf’s with the points of increase at a bounded interval, which
was conveniently represented by the interval [0, 1]. As the authors mentioned, the extension of the results to cdf’s defined on the real line requires solution of multiple rather
difficult convergence problems, which furthermore are only of little economic interest.
Moreover, it has been shown (e.g. in Mayer (1966) or Strassen (1965)) that our results
would be restricted if generalized on the real line (for more detail see section 3.4).
2.2
Mean Preserving Spread
Most of the presented concepts can be formalized quite intuitivelly, the exception is the
third approach concerned with the comparison of the weight in the tails. Therefore the
3
following part is devoted to give a geometrically motivated definition to this approach to
the risk comparison, which would be sufficiently general and analytically convenient.
We start with definition of Mean Preserving Spreads for continuous as well as discrete
random variables.
Definition 1. Mean Preserving Spreads: Densities
Let’s denote as mean preserving spread (MPS) a step function s(x) defined in a following way:



α






−α


s(x) =
−β





β




 0
≥ 0 x ∈ (a, a + t)
≤ 0 x ∈ (a + d, a + d + t)
≤ 0 x ∈ (b, b + t)
≥ 0 x ∈ (b + e, b + e + t)
otherwise
where
0≤a≤a+t≤a+d≤a+d+t≤b≤b+t≤b+e≤b+e+t≤1
and
βe = αd.
Note that a MPS is constructed such that
∫1
0
s(x)dx = 0 and also
∫1
0
xs(x)dx = 0.
Therefore if we construct a function g = f + s from the density function f such that if
∀x : g(x) ≥ 0, then g is also a density function with the same mean as f . Furthermore,
we can say that density g differs from density f by a single MPS, if a difference function
g − f is a MPS.
We can formalize similar concept also for the discrete random variables.
Definition 2. Mean Preserving Spreads: Discrete Distributions
Let’s have the discrete r.v.’s X and Y described in the following way:
Pr(X = ai ) = fi
and
Pr(Y = ai ) = gi ,
where ai is an increasing sequence of real numbers between 0 and 1 and
∑
i
fi =
∑
i
gi = 1.
Moreover, let fi = gi for all i except four, say i1 < i2 < i3 < i4 . Then we say that Y
differs from X by a single MPS, if (denoting âk = aik , fˆk = fik and ĝk = gik )
ĝ1 − fˆ1 = fˆ2 − ĝ2 ≥ 0, fˆ3 − ĝ3 = ĝ4 − fˆ4 ≥ 0 and
4
∑
k=1
4
âk (ĝk − fˆk ) = 0.
2.3
The Integral Conditions
Now we use the notion of MPS to introduce the integral conditions, which will formalize
the approach comparing the weight on the tails of random distributions. Let’s consider two
densities f and g, which differ by a single MPS s as defined in definition 1. Consequently,
the difference S = G − F of the corresponding cdf’s can be expressed as an indefinite
∫x
integral S(x) = 0 s(u)du.
It’s easy to see that S(0) = S(1) = 0. Moreover, given the definition 1 we receive

 ≥ 0 if x ≤ z
∃z ∈ [0, 1] : S(x)
 ≤ 0 if x > z
∫y
Finally, let’s denote T (y) = 0 S(x)dx. We obtain
∫ 1
∫ 1
[
]1
S(x)dx = xS(x) 0 −
xs(x)dx = 0
T (1) =
0
(1)
(2)
0
and consequently using (1) and (2)
T (y) ≥ 0, y ∈ [0, 1).
(3)
The conditions (2) and (3) are from now on referred to as the integral conditions. Note
that along with (1) they also hold for S = G−F , where G and F are discrete distributions
differing by a single MPS.
In order to use the concept of MPS as an foundation for a definition of greater variability,
we have to inquire about transitivity in sense whether G could have been obtained from
F by a sequence of MPS for F and G denoting the cdf’s of compared random variables
X and Y . By using two theoretical statements we show that convenient criterion for
comparison is contained in the integral conditions (2) and (3).
First, we state that obviously if G is obtained from F by a sequence of MPS’s, than G − F
satisfies (2) and (3). The proof is omitted as it is trivial.
Theorem 1. Let’s assume that
(a) there is a sequence of cdf ’s Fn converging (weakly) to G (Fn → G),
∑
(b) Fn differs from Fn−1 by a single MPS denoted Sn (i.e. Fn = Fn−1 +Sn = F0 + ni=1 Si ).
∑
Then G = F + ∞
i=1 Si = F + S and S satisfies the integral conditions (2) and (3).
Now we provide the non-trivial result (and somehow reverse of Theorem 1) that the
integral conditions satisfied by G − F imply an existence of an aproximation of G formed
by F and a sequence of MPS’s.
5
Theorem 2. Let’s assume that G − F satisfies the integral conditions (2) and (3).
Than there exist sequences Fn and Gn such that Fn → F , Gn → G and for each n,
Gn could have been obtained from Fn by a finite number of MPS’s.
This theorem results from the two partial results: the first lemma proves it for simple step
functions with a finite number of steps and the other one is concerned with approximation
of the arbitrary cdf’s F and G to any desired degree by the step functions, which moreover
satisfy the integral conditions. Because of lack of space, the explicit proofs aren’t provided
in this work, nevertheless the basic idea of each of them will be presented. For the complete
proofs see the original paper (Rothschild and Stiglitz, 1970, p. 232)
Lemma 1. Assume that cdf ’s F and G have a finite number of increase points and
moreover S = G − F satisfies the integral conditions (2) and (3).
Then there exists a sequence of cdf ’s F0 , . . . , Fn , where Fi differs from Fi−1 by a single
MPS for all i = 1, . . . , n, such that F0 = F and Fn = G.
In the proof, we step by step “decompose” the function S into the particular MPS’s,
which are in turn used to gradually construct sequence F1 , . . . , Fn from the function F .
As the S is by assumption a step function with a finite number of steps, we are able to
finish this process after a finite number of iterations.
Before presenting the next lemma, note that we use the following functional metric as is
shown for the arbitrary functions f1 and f2 :
∫ 1
||f1 − f2 || =
|f1 (x) − f2 (x)| dx.
Lemma 2. Denote T (y) =
∫y
0
0
(G(x) − F (x))dx for cdf ’s F and G. Furthermore assume
that T (y) fulfills (2) and (3).
Then for each n there exist cdf ’s Fn and Gn of discrete random variable with a finite
number of increase points, which satisfy
||Fn − F || + ||Gn − G|| ≤
and moreover Tn (y) =
∫y
0
4
.
n
(Gn (x) − Fn (x))dx meets the integral conditions.
The first part of the proof consists of a construction of Fn and Gn for fixed n. We uniformly divide interval [0, 1] into n subintervals I1 , . . . , In of an equal length. Consequently
we show that if Fn (x) is any step function constant on each of these intervals such that
6
Fn (x) ∈ F (Ii ) for x ∈ Ii , then ||Fn − F || ≤
2
.
n
By using a similar approach for G we
obtain the first part of the lemma.
As we see, the inequality in the lemma is satisfied by broad number of step functions Fn
and Gn , therefore in second part of the proof we define one particular pair of the functions
Fn and Gn and prove that corresponding Tn (defined as above) meets the integral conditions. Let’s remark that the values fi and gi of the functions Fn and Gn at interval Ii are
∫
chosen such that fi ∈ F (Ii ), gi ∈ G(Ii ) and (gi − fi )/n = Ii (G(x) − F (x))dx.
In conclusion, the presented results provide that the fact of the random variable Y having “more weight in the tails” than X can be analytically represented by the integral
conditions (2) and (3) satisfied by the difference of the distribution functions.
3
Partial Orderings of Distribution Functions
We move on to the next section, which summarizes the biggest theoretical results of the
original work by Rothschild and Stiglitz (1970). After the formal definition of three different approaches to the risk comparison, we present and prove their mutual equivalence.
We conclude this part by remarks on difference with the mean-variance analysis as well
as an overview of the literature.
3.1
Definition of Partial Orderings
Foundation for the definition of greater uncertainty should be given by the concept of
a partial ordering, which is used on a set of distribution functions. Therefore we start
with a definition of this relation.
Definition 3. Partial Ordering
We define a relation ≤p a partial ordering on a set in case it is binary, transitive and
antisymmetric (meaning that X ≤p Y and Y ≤p X imply X = Y ).
In the previous section, we formalized the comparison of the “weight in the tails” of distribution function by the concept of MPS. Now we formally define corresponding approach
of risk comparison.
(
)
∫y
Definition 4. Define F ≤I G iff G − F or more precisely T (y) = 0 (G(x) − F (x))dx
satisfies conditions (2) and (3).
7
To justify this definition, we have to prove following:
Lemma 3. Relation ≤I is a partial ordering.
The fact that it is transitive and reflexive is evident. In order to prove antisymmetry,
let’s contruct S1 = G − F and S2 = F − G. Apparently S1 (x) + S2 (x) = 0, which implies
T1 (y) + T2 (y) = 0 and consequently from the integral conditions T1 (y) = T2 (y) = 0.
However, this implies Si = 0 (i = 1, 2) almost everywhere (up to set of measure zero),
since any non-zero (and thus in fact strictly positive) part of Si on the set with positive
measure would end up in Ti > 0.
The second definition corresponds to the statement that less risky random variables are
preferred by every risk averter.
Definition 5. Define F ≤u G if and only if
∫ 1
∫
U (x)dF (x) ≥
0
1
U (x)dG(x)
0
for every bounded concave function U .
In this case again the properties of transitivity and reflexivity are apparent. As to antisymmetry, it is a consequence of Theorem 3 below.
Finally, let’s formalize the notion that adding the noise to the distribution increases the
riskiness of the given random variable.
Definition 6. Define F ≤a G iff there exists a joint distribution function H(x, z) of the
random variables X and Z defined on [0, 1] × [−1, 1] such that if
J(y) = Pr(X + Z ≤ y),
then
F (x) = H(x, 1)
G(y) = J(y)
and
E(Z|X = x) = 0.
Notice that the equivalent definition for the random variables X and Y would be X ≤a Y
iff Y =d X + Z (not Y = X + Z) for some random variable Z such that E(Z|X) = 0.
8
An important characterization can be given for the discrete distributions X and Y with
a finite number of points. It can be shown, that its formal structure is the same to
those given by theoretical frameworks of the inequality of income distributions and the
informativeness of information structures.
Assume that distributions of X and Y are determined by the concentration points ai and
probabilities fi , gi (i = 1, . . . , n) such that
Pr(X = ai ) = fi
and
Pr(Y = ai ) = gi .
Now let’s define a random variable Z, which conditionally depends on X in the following
way
cij = Pr(Z = aj − ai |X = ai ),
i, j = 1, . . . , n.
As a result we receive that X ≤a Y iff
n
∑
cij = 1,
i = 1, . . . , n,
(4)
cij (aj − ai ) = 0,
i = 1, . . . , n,
(5)
j = 1, . . . , n.
(6)
j=1
n
∑
j=1
gj =
n
∑
fi cij .
i=1
Comparing this statement with the previous definition we see, that expression (4) provides
for Z being a random variable, condition (5) relates to E(Z|X = x) = 0 and equation (6)
links to the statement Y =d X + Z.
We conclude this characterization by the matrix form of conditions (4), (5) and (6), where
e stands for all-ones vector (e = (1, . . . , 1)′ ):
Ce = e,
Ca = a,
g = f C.
(7)
Matrix form of the above equations is very convenient, as it allows us to easily prove the
property of reflexivity for the discrete distributions with the finite number of points
Lemma 4. If random variables X 1 , X 2 and X 3 are concentrated at a finite number of
points, then X 1 ≤a X 2 ≤a X 3 implies X 1 ≤a X 3 .
The proof simply uses the fact that if matrices C1 and C2 satisfy conditions (7), than
they hold also for matrix C ∗ = C1 C2 .
9
3.2
Equivalence Theorem
We now proceed to the main theoretical finding of the paper by Rothschild and Stiglitz
(1970), which proves the equivalence of the different apporaches to the risk comparison.
Theorem 3. The following statements are mutually equivalent:
(A) F ≤I G,
(B) F ≤u G,
(C) F ≤a G.
As this theorem is the essential for the original paper, we present its proof in our work,
though slightly modified. To obtain the desired equivalence, the divide it into a sequence
of implications and prove each one of them individually.
(C) ⇒ (B)
We assume that F ≤a G, i.e. Y =d X + Z and E(Z|X = x) = 0 for some random
variable Z. Let’s take U an arbitrary concave function. For X fixed, we take expectations
with respect to Z and use Jensen’s inequality to obtain:
EX U (X + Z) ≤ U (EX (X + Z)) = U (X).
Furthermore, we apply expectations with respect to X and finally receive
EEX U (X + Z) = EU (Y ) ≤ EU (X).
(B) ⇒ (A)
As F ≤u G, for every concave U we have under definition 5
∫1
U (x)dS(x) ≤ 0, where
∫1
S = G − F . Using the fact that x and −x are both concave, we receive 0 xdS(X) ≤ 0
∫1
∫1
and also 0 −xdS(x) ≤ 0, together implying 0 xdS(X) = 0. By integration by parts,
∫
0 =
0
1
[
]1
xdS(x) = xS(x) 0 −
∫
∫
1
S(x)dx =
0
0
1
S(x)dx = T (1),
0
which yields the integral condition (2). Consequently, let’s consider special function
by (x) = max(y − x, 0) for fixed y. Since −by (x) is concave, we obtain:
∫ y
∫ y
∫
[
]y
0 ≤
(y − x)dS(x) = yS(y) −
xdS(x) = yS(y) − xS(x) 0 +
0
0
10
y
S(x)dx = T (y).
0
(A) ⇒ (C)
Let’s first consider F and G discrete random variables, defined as follows
Pr(X = ai ) = fi
and
Pr(Y = ai ) = gi ,
which differ by a single MPS. Using the definition 2, let’s consider points of different
probability weights â1 < â2 < â3 < â4 . Consequently, by denoting γk = ĝk − fˆk , we obtain
γ1 = −γ2 ≥ 0,
γ4 = −γ3 ≥ 0
and
4
∑
âk γk = 0.
k=1
Let’s define matrix C in following way:

1

 γ1 (â4 −â2 )
 f2 (â4 −â1 )
C = 
 γ4 (â4 −â3 )
 f (â −â )
 3 4 1
0

0
0
g2
f2
0
0
g3
f3
0
0
0

γ1 (â2 −â1 ) 

f2 (â4 −â1 ) 
.
γ4 (â3 −â1 ) 

f3 (â4 −â1 ) 
1
Based on the characterization of ≤a for discrete random variables in definition 6, we know
that it suffices to show that elements cij of matrix C satisfy conditions (4), (5) and (6).
It’s easy to prove that conditions (4) and (5) are met, thus providing that Z defined by
cij = Pr(Z = aj − ai |X = ai )
is a random variable and satisfies E(Z|X) = 0. To show that Y =d X + Z (condition (6)),
following the approach in original proof, let’s define a discrete variable Y 1 = X + Z. Fact
that E(Z) = 0 implies E(Y 1 ) = E(Y ). Moreover, Y 1 may differ from Y only by different
probability weight in the points a1 , a2 , a3 and a4 . However, by definition of Z and Y 1 we
obtain
Pr(Y 1 = a2 ) = Pr(X = a2 ) Pr(Z = 0|X = a2 ) = f2
g2
= Pr(Y = a2 )
f2
and similarly Pr(Y 1 = a3 ) = Pr(Y = a3 ). Therefore the difference between these random
variables is possible only in probability of points a1 and a4 . But, as a1 < a4 , an inequality
of probabilities would yield an inequality of mean values, which contradicts with a fact
E(Y 1 ) = E(Y ). Thus Y =d Y 1 .
We use theoretical findings already derived in the paper to extend this result to all cdf’s.
First, by lemmas 1 and 4 the implication holds also for the discrete distributions with
11
a finite number of points. Finally, theorem 2 provides for the validity of result for all the
cdf’s.
Assuming F ≤I G, theorem guarantees an existence of sequences of discrete distributions
with finite number of increase points {Fn } and {Gn } such that Fn → F , Gn → G and
Fn ≤I Gn , which implies Fn ≤a Gn by the first part of the proof. Let’s denote Hn (x, z)
the joint distribution function of the random variables Xn and Zn in the way that if
Jn (y) = Pr(Xn + Zn ≤ y), then
Fn (x) = Hn (x, 1) andE(Zn |Xn ) = 0.
Jn (y) = Gn (y),
The last condition can be represented as
∫ 1
1
−1 u(x) z dHn (x, z) = 0
(8)
0
for all continuous functions u(x) on [0, 1]. Let’s denote the expression in the equation (8) as
Mn . Since the distribution function Hn is stochastically bounded, a subsequence {Hn′ } of
the sequence {Hn } exists such that Hn′ → H(x, z), where H(x, z) is the joint distribution
function of X and Z. It’s easy to see that Hn′ (x, 1) → F (x) and Jn′ → G. Moreover,
∫1∫1
∫1∫1
Mn′ → 0 −1 u(x)zdH(x, z) implying (since Mn′ ≡ 0) 0 −1 u(x)zdH(x, z) = 0 and
furthermore E(Z|X) = 0, thus completing the proof.
3.3
Remarks on Mean-Variance Analysis
The following part contains several remarks on the risk-comparison approach concerned
with the comparison of variances of the random variables.In the section 2.1, we introduced
four different concepts to the risk comparison, however the equivalence proven above holds
only for three of them, excluding the mean-variance analysis described by ordering ≤v
(X ≤v Y if E(X) = E(Y ) and E(X 2 ) ≤ E(Y 2 )). A reason for this is that the relations ≤I ,
≤u and ≤a were characterized as the partial orderings, while the mean-variance analysis
is a complete ordering.
This characteristic is considered to be a disadvantage rather than advantage, as there are
examples of random variables X1 and X2 with the same mean such that E(X12 ) < E(X22 )
and E(U (X1 )) < E(U (X2 )) for some nonquadratic concave function. In fact, it can
be shown that a function U is quadratic (and convex) if and only if X ≥v Y implies
E(U (X)) ≥ E(U (Y )). On the other hand, partiality of the ordering of ≤I , ≤u and ≤a
12
can be demonstrated e.g. by case, where T (y) =
∫y
0
(F (x)−G(x))dx changes sign. In such
a case, distributions F and G cannot be ordered.
Regarding the mean-variance analysis, Rothschild and Stiglitz noted Tobin’s assumption
that such approach may be appropriate for the restricted class of distributions. Authors
agree, however they object (see (Rothschild and Stiglitz, 1970, chap. IV.)) that these
restrictions are far too severe, allowing only for a changes in distributions from F to G
such that F (x) = G(ax + b) for some a > 0, b (compare in Tobin (1965)).
3.4
Previous Literature on Given Topic
We conclude this part by remark about the previous literature on the covered topics,
as it is basically presented in the paper Rothschild and Stiglitz (1972). As the authors
reported, although they considered their result to be the entirely new idea, various sources
have proven them wrong.
The presented results on the equivalence of risk-comparison approaches, particularly the
theorem 3 as a main result of the paper, had been known especially to the mathematical
statisticians. For some time, it have already had an important place in a branch od
statistical theory called “the comparison of experiments”.
As to the examples of such works, these findings are presented e.g. in book by Blackwell
and Girschak (Blackwell and Girshick, 1954, chap. 12). Furthermore, more general as
well as modern methods can be found in chapter 11 of Mayer (1966) and Strassen (1965).
Let’s note here that these references have shown that the equivalence between the ≤a
and ≤u orderings holds for general distributions defined over more general spaces than
the interval [0, 1], such as the compact subsets of Rn . Unfortunately, ordering ≤I doesn’t
seem to provide for such a generalization.
4
Economic Applications
In the final part of this seminar work, we review the results presented in Rothschild and
Stiglitz (1971), which provide examples of the economic applications of findings derived
in Rothschild and Stiglitz (1970). As the authors state, two approaches to investigations
of the effect of risk on economic decisions are overviewed here: the effects of increasing
risk and choice of a probability distribution.
13
First part offers an alternative approach to the mean-variance analysis regarding the
problem of the economic effects of increasing risk. To provide some general framework,
let’s assume that an individual chooses a level of some control parameter α to maximize
∫
expected utility U (θ, α)dF (θ), where θ is a random variable. Optimality condition for
the variable α is
∫
∂U (θ, α)dF (θ)
= EUα (θ, α) = 0.
∂α
(9)
Assume further that α∗ is a unique solution of (9) and U is decreasing in α in the neighbourhood of α∗ . Given that Uα (θ, α) is a concave function of θ, our definition of risk
comparison (particularly definition 5 concerned with the behavior of all the risk averters
described through the concave utility function) implies that an increase in riskiness will
decrase α∗ . Similarly, in case that the function Uα (θ, α) is convex in θ, a value of α∗
increases if the uncertainty is bigger.
In what follows, we try do apply this idea and decide about the conditions for convexity
and concavity of subject functions. As a general conclusion, we show that mean-variance
analysis provides results that are misleadingly general, opposing to our approach estabilished by theorem 3. Moreover we show that the Arrow-Pratt concepts of relative and
absolute risk aversion provide a convenient approach to inquiry conditions for the convexity or concavity of a given function.
After introduction of the main ideas, we present several examples of their possible application in known economic models. In part 4.1 we address the topic of savings and
uncertainty. Consequently, part 4.2 is devoted to a portfolio problem with several remarks on more general combined portfolio-savings problem. The last subsection 4.3 deals
with a firm’s production problem.
To be precise, work Rothschild and Stiglitz (1971) contains two more examples of economic
applications, which deal with a multi-stage planning problem in economy and choice of
output level for a competitive firm. Although they are quite interesting, we don’t present
them in detail because of lack of space.
Finally, in the part 4.4 we show an application of the equivalence of three alternative
approaches from Rothschild and Stiglitz (1970) (overviewed in part 3.2) to proof some of
the general theorems dealing with the situations of the probability distributions choice.
14
4.1
Savings and Uncertainty
In the first example we present an analysis of the effect of risk on the savings’ rate of
return. An individual wishes to allocate a given wealth W0 between consumption today
and tomorrow. Wealth not consumed today is invested and yields the random return e
per dollar invested. The expected two-period utility is
E [U (C1 ) + (1 − δ)U (C2 )] = U ((1 − s)W0 ) + (1 − δ)EU (sW0 e),
(10)
with savings rate s and pure rate of time discount δ. We assume that the individual is
a risk averter, with the utility function satisfying U ′ > 0 and U ′′ < 0. By setting the
derivative of (10) with respect to s equal to zero, we obtain necessary and (as a reason of
risk aversion property) also sufficient condition for utility maximization:
U ′ ((1 − s)W0 ) = E[U ′ (sW0 e)](1 − δ)e.
(11)
Intuitively, the increased uncertainty in the return on savings could have two possible
outcomes on the savings: they could either drop because “a bird in the hand is worth two
in the bush” or grow since risk averter saves more when facing increased unceartainty.
Formally, whether bigger risk increases or decreases an optimal level of savings s∗ depends
on convexity or concavity of eU ′ (sW0 e) in e. As a result, under the case of increasing risk
the level of s∗ grows if
2U ′′ (C) + U ′′′ (C)C > 0
(12)
and drops if converse inequality holds. Note that condition U ′′′ (C) ≤ 0 suffices for increasing risk to decrease savings.
Applying the Arrow-Pratt concept, we can reformulate these results using relative risk
aversion coefficient (R = −CU ′′ /U ′ ). It can be observed that R′ has the same sign as
−(U ′′′ C + U ′′ (1 + R)), thus we can state that inequality (12) holds if R is nonincreasing
and greater than one. On the other hand, R nondecreasing and less than one provides
for opposite inequality.
Let’s conclude this example with comment on the application of mean-variance analysis.
As we already presented in part 3.3, this approach is equivalent to the assumption that
U is quadratic. However, if U (C) = aC − 12 bC 2 , then we can express the RHS of (11) as
(1 − δ)(aE(e) − bsW0 E(e2 )),
15
which decreases with E(e2 ) growing. Consequently, s has to drop in order to meet the
equality (11). As a result, this approach provides conclusion, which is compatible only
with the first argument (growing risk decreases savings) while omitting the second one
(savings increase as the variability raises).
4.2
Portfolio Problem and Combined Portfolio-Savings Problem
Let’s now move on to address the portfolio problem. Assume that an investor wishes
to divide his portfolio between money with zero rate of return and a risky asset with
a random rate of return e. If we represent W0 as his initial wealth and α as part of this
wealth invested in the risky asset, we obtain for the terminal wealth W (α) = W0 (αe + 1).
Again our objective is to maximize the expected utility of terminal wealth EU (W (α)) with
the utility function U satisfying the “risk averter” conditions formulated in the previous
problem (i.e. U ′ > 0, U ′′ < 0).
Let’s denote F the distribution function of e. Then the optimal α has to satisfy first order
condition
∫
′
H(α) = W0 E[U e] = W0
U ′ (W (α))edF (e) = 0.
Notice that given the assumptions on the utility function, this condition is necessary as
well as sufficient (since H ′ (α) < 0). Let’s again consider change in variability of e. Our
question is how the optimal level of α reacts to such a change.
Using the mean-variance analysis and the utility function in the form of a quadratic
function U (W ) = aW − 21 bW 2 , we receive that α = (a − bW0 )E(e)/E(e2 )bW0 . Thus if
e becomes riskier (i.e. E(e2 ) increases with E(e) remaining constant), the optimal level
α has to grow. However, this result may not be true in general, though misleadingly
presented as a such. This can be observed using approach estabilished by theorem 3.
Consider that the distribution of e is changed from F to more variable G with the new
∫
optimal allocation parameter α̃ satisfying U ′ (W (α̃)) e dG(e) = 0. Let’s define a function
∫
S = G − F , then α̃ R α if U ′ (W (α)) e dS(e) R 0. Denote V (e) = U ′ (W (α)) and further
assume that F and G have their points of increase confined to the interval (a, b). Now we
see that condition
∫
b
V (e) e dS(e) ≤ 0
(13)
a
for all positive and decreasing V and all S satisfying the integral conditions (2) and (3)
implies that an increase of variability decreases demand for risky assets by all risk-averse
16
individuals. Moreover, by using (3) and the second mean value theorem of the integral
calculus, we obtain a sufficient condition for (13) in a form
∫ c
∀c ∈ (a, b) :
e dS(e) = h(c) ≤ 0.
a
Furthermore, it can be shown that it is also a necessary condition. Otherwise we would
∫b
have c such that h(c) > 0. In this case we must have a V (e)dS(e) < 0 for all positive
and decreasing V in order to (13) to be satisfied. Now consider

 V for a ≤ e < c
1
V =
,
 V for c ≤ e ≤ b
2
where V1 > V2 > 0. Then
∫b
a
V dS(e) > 0, a contradiction.
Concerning the statement that the increasing variability decreases the demand for risky
assets, authors proclaim that it is possible to show the incrasing concave utility functions,
which always satisfy it, and to prove that this type of utility functions doesn’t have a
property that increasing risk always increases α.
Regarding the application of the Arrow-Pratt concept of risk aversion, let’s first denote
Z(e) = eU ′ (W (α)). We can interpret the previous results in a way that concavity of Z(e)
implies α̃ ≤ α. Using relative and absolute risk aversion coefficients R = −U ′′ W/U ′ and
A = −U ′′ /U ′ , we can express Z ′′ (e) in a following form:
Z ′′ (e) = [(1 − R + AW0 )U ′′ + (W0 A′ − R)U ′ ]W0 a.
Thus the nondecreasing relative risk aversion less than or equal to one together with the
nonicreasing absolute risk aversion are sufficient conditions for the decrease of the share
of risky asset caused by the increase of a risk.
We conclude this example by the notion of the portfolio-savings problem. In the model
we consider an individual who maximizes the expected value of the discounted utility of
consumption
E
∞
∑
(1 − δ)t U (Ct ),
t=0
where δ represents the discount rate and Ct denotes consumption at time t subject to the
stochastic constraints
Wt+1 = (Wt − Ct ) rt ,
17
where Wt stands for the wealth at time t and rt − 1 is the stochastic rate of return, which
consists of the rates of return of two assets given by expression
rt = αr1t + (1 − α)r2t .
Parameter α is fraction invested in the first asset and r1t and r2t represent the rate of
returns of asset 1 and asset 2 respectively.
We again ask what effect will an increase in risk of the one of the assets’ return have on
portfolio allocation and savings. Although it seems reasonable that an increase in the
variance of one asset decreases the proportion invested in this asset, it can be shown that
under special conditions an increase in variability could have the opposite effect.
We can also analyze an effect on the savings rate. Considering the CRRA utility function
U (C) = C 1−a /(1 − a) (for a > 0, a ̸= 1), we obtain that if a < 1, then an increase in the
variability of r increases the savings rate, while a > 1 provides for an opposite result.
4.3
Firm’s Production’s Problem
As a last example presented in this part on the effects of increased risk, we overview
a problem of production setting. Let’s consider firm with uncertain output Q over next
period. A goal of the firm is to minimize the expected cost of production. Assume further
a two-factor concave production function P (K, L), which represents production process,
i.e. Q = P (K, L). K represents capital, which cannot be varied in the short run, and L
stands for labor, which, on the contrary, is variable.
The expected costs of production are given by expression
E[rK + wL(K, Q)] = rK + wE[L(K, Q)],
(14)
where r is the cost of capital, w cost of labor and L(K, Q) stands for the level of the
labor, which is required to produce Q with capital K. Our question is what happens to
the expected costs as the variability of Q increases. To answer it we use the fact that
L(K, Q) is convex in Q for any given level of K (this is implied by the concavity of F ).
Therefore using our approach given by definition 5 we obtain that higher variability of Q
always results in the higher expected costs.
Consequent problem, which si more difficult to answer, addresses the reaction of the
optimum level of K to the increase of variability of Q. Authors point out that the answer
18
is related to the elasticity of substituion between K and L. Let’s start with a derivation
of the first order conditions from (14):
[
]
∂L(Q, K)
r
= E
,
w
∂K
which can be interpreted in a way that the factor-price ratio must be equal the mean
value (or the average) or the marginal rate of substitutions.
We conclude by two examples of particular production function. First we consider the
production function with a constant elasticity of substitution
(
)1
Q(K, L) = δK ρ + (1 − δ)Lρ ρ .
It can be shown that condition ρ ≤ 0 (or equivalently the elasticity of substitution less
than or equal one) implies convexity of ∂L/∂K with respect to Q, meaning that increase
of Q variability causes rise in the optimal level of K.
As a second example, we look at the production function with infinite elasticity
Q(K, L) = bK + aL.
Let’s denote G(Q) the distribution function for Q. Then it can be shown (for details
see (Rothschild and Stiglitz, 1971, p. 79)) that behavior of K regarding the increase of
variability of Q depends on the term G−1 (1 − (ar/wb)). To be more specific, optimal level
of K increases if G−1 (1 − (ar/wb)) rises, or (equivalently) if probability that Q exceeds
bK increases.
4.4
Choosing a Probability Distribution
Finally, we address several examples of the application of our theoretical results (the
definition of variability and basic theorem on the equivalence) to prove some general
theorems dealing with the choice of a probability distribution from the set of possible
probability distribution.
We start with the diversification theorem. Consider an individual, who can allocate his
given initial wealth between two securities. Their values next period e1 and e2 (per
dollar invested) are assumed to have identical and independent distributions. An investor
chooses b to maximize the expected utility
EU (W ) = EU ((be1 + (1 − b)e2 )W ),
19
where U is a concave function. The diversification theorem states, that optimal b holds
b=
1
2
independently of the utility function.
To prove this statement, let’s define yb = (be1 +(1−b)e2 )W0 . Note that we can reformulate
yb = y1/2 − (b − 1/2)(e1 − e2 )W0 and furthermore E(e1 − e2 |y1/2 ) = 0. This by definition
6 provides that y1/2 ≤a yb . Using the theorem 3 we receive y1/2 ≤u yb , therefore all
individuals with concave utility functions prefer y1/2 to yb .
Second, and final presented example deals with the Rao-Blackwell Theorem. Assume
a random distribution depending on an unknown parameter θ and consider a sample of
random variables x = (x1 , . . . , xn ) generated from this distribution. Furthermore consider
that the criteria for the choice of estimator d(x) of parameter θ depending on the sample x
is the minimization of the expected value of a convex loss function L(d(x)). The RaoBlackwell theorem states that for any estimator d(x) and any L that is convex, the
existence of sufficient statistic T for θ implies an existence of the estimator d∗ at least as
good as d(x) in the sense that EL(d∗ (x)) ≤ EL(d(x)).
To prove this, let’s define d∗ (x) = E(d(x)|T ) for every T . It’s easy to see that we have to
prove d∗ (x) ≤u d(x), which is, by theorem 3, equivalent with d∗ (x) ≤a d(x). Consider the
random variable z defined by equation d(x) = d∗ (x) + z. By definition of d∗ it holds that
E(z|T, x) = E(z|d∗ ) = 0. Thus we may conclude (by definition 6) that d∗ (x) ≤a d(x).
5
Conclusion
An aim of this seminar work is to provide a summary of the interesting series of papers
by Rothschild and Stiglitz (1970, 1971, 1972). We start with the section on theoretical
background, introducing four different approaches generally used to compare variability
of the random variables. In particular, we in depth explain and provide a formalization
for the concept of comparison of “the weight in the tails” of the random variables’ distribution.
In the second part, we focus on the equivalence theorem as a main result of this series of
articles. After the introduction of the partial ordering and definition of three variabilitycomparison approaches within this framework, we provide the theorem itself along with
the complete proof. A contribution of these theoretical finding within this framework is
that it states an equivalence of different perspectives on the issue of comparison of risk,
20
thus providing basis for the convenient definition of greater variability.
The final section is devoted to the examples of economic problems as potential applications of derived results. We present simple models on savings, portfolio allocation and
firm’s choice of production level. In each of these examples, we address the question of
impact of increased variability on the optimal levels of variables in the model. We show
that solutions given by the mean-variance analysis can be often misleading, omitting possibility of different outcomes other than the one single result. Finally, we apply the results
to prove some general theorems dealing with the choice of the probability distribution.
To conclude, the original works are some of the most essential regarding the economic
problem of risk assessment and comparison. Although, as the authors themselves admit,
the presented theoretical findings have been known before and therefore aren’t completely
new, they provide a complex theoretical background and thorough insight on the given
issues.
References
Blackwell, D. and M. A. Girshick (1954). Theory of Games and Statistical Decisions.
Wiley, New York.
Mayer, P. A. (1966). Probability and Potentials. Blaisdell, Waltham, Ma.
Rothschild, M. and J. E. Stiglitz (1970). Increasing Risk: I. A Definition. Journal of
Economic Theory 2 (3), 225–243.
Rothschild, M. and J. E. Stiglitz (1971). Increasing Risk II: Its Economic Consequences.
Journal of Economic Theory 3 (1), 66–84.
Rothschild, M. and J. E. Stiglitz (1972). Addendum to ”Increasing Risk: I. A Definition”.
Journal of Economic Theory 5 (2), 306–306.
Strassen, V. (1965). The Existence of Probability Measures with Given Marginals. The
Annals of Mathematical Statistics 36 (2), 423–439.
Tobin, J. (1965). The Theory of Portfolio Selection. In F. Hahn and F. Brechling (Eds.),
The Theory of Interest Rates. MacMillan, London.
21