Download Summary of the papers on ”Increasing risk” by Rothschild and Stiglitz

Summary of the papers on ”Increasing risk” by Rothschild and Stiglitz Seminar work Pavol Majher 1169390 Economic Literature Seminar Lecturer: prof. Manfred Nermuth November 2011 1 Introduction A comparison of the variability (or “riskiness”) of different random variables has been of a particular interest of many economists. Over past decades, several approaches to address this problem have been developed. One of the most essential works in this field is series of articles written by Rothschild and Stiglitz (1970, 1971, 1972), which have become widely recognized over time. An aim of this seminar work is to overview theory and results, which are presented in these papers. Our structure follows the layout of the original articles. We start with an introduction of four concepts to compare variability and provide a deeper theoretical background for one of them. Consequently we move on to the definition of three different partial orderings, which are related to these different approaches to the risk comparison. Furthermore, as a main result of the paper Rothschild and Stiglitz (1970), we show the proof of their mutual equivalence. Moreover, several remarks are spend on the comparison to the approach of mean-variance analysis as well as to the references mentioned in Rothschild and Stiglitz (1972). In the second part of our work, we present the second paper Rothschild and Stiglitz (1971), which focuses on the economic applications on the derived framework. An economic examples such that savings and uncertainty, a portfolio problem or firm’s production problem are overviewed. In each case, the focus is on the impact, which the higher rate of uncertainty has on the decision making process. 2 Theoretical Background 2.1 Different Concepts of Risk Comparison We start with an informal introduction of different approaches to the risk comparison, which are formalized later in section 3.1. As there is an ambition to set up the criteria to decide when is the random variable Y more “variable” than different random variable X, four possible answers are listed in Rothschild and Stiglitz (1970): 1. Y is equal to X plus noise It is reasonable that the random variable created from the original by adding some uncorrelated noise should be riskier than the former. To illustrate this concept, let’s 2 take X a lottery ticket that pays ai with probability pi (such that ∑ pi = 1). Then Y could be considered as a lottery ticket, which pays bi with probability pi such that bi is either ai or lottery ticket with expected value ai . 2. Every risk averter prefers X to Y A risk averter is defined as a one having a concave utility function. Thus X can be viewed as less risky than Y if EU (X) ≥ EU (Y ) (given that X and Y have the same mean). 3. X has less weight in the tails than Y For random variables X and Y with density functions f and g, it seems adequate to present X as less variable if some probability weight has been shifted in f from the center towards the tails to obtain g in way that the mean remained the same. 4. Y has a greater variance than X Comparison of variances of two random variables is commonly used tool to compare their riskiness. As it is presented later (particularly part 3.2), the first three concepts are mutually equivalent definitions of greater riskiness, while the last one provides quite different apporach. More on this “difference” is presented in part 3.3. Before moving forward, we have to introduce notation as used in the original paper. From now on, X and Y denote random variables with cumulative distribution functions (cdf’s) F and G and (in case they exist) densities f and g. At the time of paper’s publication the results applied only at the cdf’s with the points of increase at a bounded interval, which was conveniently represented by the interval [0, 1]. As the authors mentioned, the extension of the results to cdf’s defined on the real line requires solution of multiple rather difficult convergence problems, which furthermore are only of little economic interest. Moreover, it has been shown (e.g. in Mayer (1966) or Strassen (1965)) that our results would be restricted if generalized on the real line (for more detail see section 3.4). 2.2 Mean Preserving Spread Most of the presented concepts can be formalized quite intuitivelly, the exception is the third approach concerned with the comparison of the weight in the tails. Therefore the 3 following part is devoted to give a geometrically motivated definition to this approach to the risk comparison, which would be sufficiently general and analytically convenient. We start with definition of Mean Preserving Spreads for continuous as well as discrete random variables. Definition 1. Mean Preserving Spreads: Densities Let’s denote as mean preserving spread (MPS) a step function s(x) defined in a following way:    α       −α   s(x) = −β      β      0 ≥ 0 x ∈ (a, a + t) ≤ 0 x ∈ (a + d, a + d + t) ≤ 0 x ∈ (b, b + t) ≥ 0 x ∈ (b + e, b + e + t) otherwise where 0≤a≤a+t≤a+d≤a+d+t≤b≤b+t≤b+e≤b+e+t≤1 and βe = αd. Note that a MPS is constructed such that ∫1 0 s(x)dx = 0 and also ∫1 0 xs(x)dx = 0. Therefore if we construct a function g = f + s from the density function f such that if ∀x : g(x) ≥ 0, then g is also a density function with the same mean as f . Furthermore, we can say that density g differs from density f by a single MPS, if a difference function g − f is a MPS. We can formalize similar concept also for the discrete random variables. Definition 2. Mean Preserving Spreads: Discrete Distributions Let’s have the discrete r.v.’s X and Y described in the following way: Pr(X = ai ) = fi and Pr(Y = ai ) = gi , where ai is an increasing sequence of real numbers between 0 and 1 and ∑ i fi = ∑ i gi = 1. Moreover, let fi = gi for all i except four, say i1 < i2 < i3 < i4 . Then we say that Y differs from X by a single MPS, if (denoting âk = aik , fˆk = fik and ĝk = gik ) ĝ1 − fˆ1 = fˆ2 − ĝ2 ≥ 0, fˆ3 − ĝ3 = ĝ4 − fˆ4 ≥ 0 and 4 ∑ k=1 4 âk (ĝk − fˆk ) = 0. 2.3 The Integral Conditions Now we use the notion of MPS to introduce the integral conditions, which will formalize the approach comparing the weight on the tails of random distributions. Let’s consider two densities f and g, which differ by a single MPS s as defined in definition 1. Consequently, the difference S = G − F of the corresponding cdf’s can be expressed as an indefinite ∫x integral S(x) = 0 s(u)du. It’s easy to see that S(0) = S(1) = 0. Moreover, given the definition 1 we receive   ≥ 0 if x ≤ z ∃z ∈ [0, 1] : S(x)  ≤ 0 if x > z ∫y Finally, let’s denote T (y) = 0 S(x)dx. We obtain ∫ 1 ∫ 1 [ ]1 S(x)dx = xS(x) 0 − xs(x)dx = 0 T (1) = 0 (1) (2) 0 and consequently using (1) and (2) T (y) ≥ 0, y ∈ [0, 1). (3) The conditions (2) and (3) are from now on referred to as the integral conditions. Note that along with (1) they also hold for S = G−F , where G and F are discrete distributions differing by a single MPS. In order to use the concept of MPS as an foundation for a definition of greater variability, we have to inquire about transitivity in sense whether G could have been obtained from F by a sequence of MPS for F and G denoting the cdf’s of compared random variables X and Y . By using two theoretical statements we show that convenient criterion for comparison is contained in the integral conditions (2) and (3). First, we state that obviously if G is obtained from F by a sequence of MPS’s, than G − F satisfies (2) and (3). The proof is omitted as it is trivial. Theorem 1. Let’s assume that (a) there is a sequence of cdf ’s Fn converging (weakly) to G (Fn → G), ∑ (b) Fn differs from Fn−1 by a single MPS denoted Sn (i.e. Fn = Fn−1 +Sn = F0 + ni=1 Si ). ∑ Then G = F + ∞ i=1 Si = F + S and S satisfies the integral conditions (2) and (3). Now we provide the non-trivial result (and somehow reverse of Theorem 1) that the integral conditions satisfied by G − F imply an existence of an aproximation of G formed by F and a sequence of MPS’s. 5 Theorem 2. Let’s assume that G − F satisfies the integral conditions (2) and (3). Than there exist sequences Fn and Gn such that Fn → F , Gn → G and for each n, Gn could have been obtained from Fn by a finite number of MPS’s. This theorem results from the two partial results: the first lemma proves it for simple step functions with a finite number of steps and the other one is concerned with approximation of the arbitrary cdf’s F and G to any desired degree by the step functions, which moreover satisfy the integral conditions. Because of lack of space, the explicit proofs aren’t provided in this work, nevertheless the basic idea of each of them will be presented. For the complete proofs see the original paper (Rothschild and Stiglitz, 1970, p. 232) Lemma 1. Assume that cdf ’s F and G have a finite number of increase points and moreover S = G − F satisfies the integral conditions (2) and (3). Then there exists a sequence of cdf ’s F0 , . . . , Fn , where Fi differs from Fi−1 by a single MPS for all i = 1, . . . , n, such that F0 = F and Fn = G. In the proof, we step by step “decompose” the function S into the particular MPS’s, which are in turn used to gradually construct sequence F1 , . . . , Fn from the function F . As the S is by assumption a step function with a finite number of steps, we are able to finish this process after a finite number of iterations. Before presenting the next lemma, note that we use the following functional metric as is shown for the arbitrary functions f1 and f2 : ∫ 1 ||f1 − f2 || = |f1 (x) − f2 (x)| dx. Lemma 2. Denote T (y) = ∫y 0 0 (G(x) − F (x))dx for cdf ’s F and G. Furthermore assume that T (y) fulfills (2) and (3). Then for each n there exist cdf ’s Fn and Gn of discrete random variable with a finite number of increase points, which satisfy ||Fn − F || + ||Gn − G|| ≤ and moreover Tn (y) = ∫y 0 4 . n (Gn (x) − Fn (x))dx meets the integral conditions. The first part of the proof consists of a construction of Fn and Gn for fixed n. We uniformly divide interval [0, 1] into n subintervals I1 , . . . , In of an equal length. Consequently we show that if Fn (x) is any step function constant on each of these intervals such that 6 Fn (x) ∈ F (Ii ) for x ∈ Ii , then ||Fn − F || ≤ 2 . n By using a similar approach for G we obtain the first part of the lemma. As we see, the inequality in the lemma is satisfied by broad number of step functions Fn and Gn , therefore in second part of the proof we define one particular pair of the functions Fn and Gn and prove that corresponding Tn (defined as above) meets the integral conditions. Let’s remark that the values fi and gi of the functions Fn and Gn at interval Ii are ∫ chosen such that fi ∈ F (Ii ), gi ∈ G(Ii ) and (gi − fi )/n = Ii (G(x) − F (x))dx. In conclusion, the presented results provide that the fact of the random variable Y having “more weight in the tails” than X can be analytically represented by the integral conditions (2) and (3) satisfied by the difference of the distribution functions. 3 Partial Orderings of Distribution Functions We move on to the next section, which summarizes the biggest theoretical results of the original work by Rothschild and Stiglitz (1970). After the formal definition of three different approaches to the risk comparison, we present and prove their mutual equivalence. We conclude this part by remarks on difference with the mean-variance analysis as well as an overview of the literature. 3.1 Definition of Partial Orderings Foundation for the definition of greater uncertainty should be given by the concept of a partial ordering, which is used on a set of distribution functions. Therefore we start with a definition of this relation. Definition 3. Partial Ordering We define a relation ≤p a partial ordering on a set in case it is binary, transitive and antisymmetric (meaning that X ≤p Y and Y ≤p X imply X = Y ). In the previous section, we formalized the comparison of the “weight in the tails” of distribution function by the concept of MPS. Now we formally define corresponding approach of risk comparison. ( ) ∫y Definition 4. Define F ≤I G iff G − F or more precisely T (y) = 0 (G(x) − F (x))dx satisfies conditions (2) and (3). 7 To justify this definition, we have to prove following: Lemma 3. Relation ≤I is a partial ordering. The fact that it is transitive and reflexive is evident. In order to prove antisymmetry, let’s contruct S1 = G − F and S2 = F − G. Apparently S1 (x) + S2 (x) = 0, which implies T1 (y) + T2 (y) = 0 and consequently from the integral conditions T1 (y) = T2 (y) = 0. However, this implies Si = 0 (i = 1, 2) almost everywhere (up to set of measure zero), since any non-zero (and thus in fact strictly positive) part of Si on the set with positive measure would end up in Ti > 0. The second definition corresponds to the statement that less risky random variables are preferred by every risk averter. Definition 5. Define F ≤u G if and only if ∫ 1 ∫ U (x)dF (x) ≥ 0 1 U (x)dG(x) 0 for every bounded concave function U . In this case again the properties of transitivity and reflexivity are apparent. As to antisymmetry, it is a consequence of Theorem 3 below. Finally, let’s formalize the notion that adding the noise to the distribution increases the riskiness of the given random variable. Definition 6. Define F ≤a G iff there exists a joint distribution function H(x, z) of the random variables X and Z defined on [0, 1] × [−1, 1] such that if J(y) = Pr(X + Z ≤ y), then F (x) = H(x, 1) G(y) = J(y) and E(Z|X = x) = 0. Notice that the equivalent definition for the random variables X and Y would be X ≤a Y iff Y =d X + Z (not Y = X + Z) for some random variable Z such that E(Z|X) = 0. 8 An important characterization can be given for the discrete distributions X and Y with a finite number of points. It can be shown, that its formal structure is the same to those given by theoretical frameworks of the inequality of income distributions and the informativeness of information structures. Assume that distributions of X and Y are determined by the concentration points ai and probabilities fi , gi (i = 1, . . . , n) such that Pr(X = ai ) = fi and Pr(Y = ai ) = gi . Now let’s define a random variable Z, which conditionally depends on X in the following way cij = Pr(Z = aj − ai |X = ai ), i, j = 1, . . . , n. As a result we receive that X ≤a Y iff n ∑ cij = 1, i = 1, . . . , n, (4) cij (aj − ai ) = 0, i = 1, . . . , n, (5) j = 1, . . . , n. (6) j=1 n ∑ j=1 gj = n ∑ fi cij . i=1 Comparing this statement with the previous definition we see, that expression (4) provides for Z being a random variable, condition (5) relates to E(Z|X = x) = 0 and equation (6) links to the statement Y =d X + Z. We conclude this characterization by the matrix form of conditions (4), (5) and (6), where e stands for all-ones vector (e = (1, . . . , 1)′ ): Ce = e, Ca = a, g = f C. (7) Matrix form of the above equations is very convenient, as it allows us to easily prove the property of reflexivity for the discrete distributions with the finite number of points Lemma 4. If random variables X 1 , X 2 and X 3 are concentrated at a finite number of points, then X 1 ≤a X 2 ≤a X 3 implies X 1 ≤a X 3 . The proof simply uses the fact that if matrices C1 and C2 satisfy conditions (7), than they hold also for matrix C ∗ = C1 C2 . 9 3.2 Equivalence Theorem We now proceed to the main theoretical finding of the paper by Rothschild and Stiglitz (1970), which proves the equivalence of the different apporaches to the risk comparison. Theorem 3. The following statements are mutually equivalent: (A) F ≤I G, (B) F ≤u G, (C) F ≤a G. As this theorem is the essential for the original paper, we present its proof in our work, though slightly modified. To obtain the desired equivalence, the divide it into a sequence of implications and prove each one of them individually. (C) ⇒ (B) We assume that F ≤a G, i.e. Y =d X + Z and E(Z|X = x) = 0 for some random variable Z. Let’s take U an arbitrary concave function. For X fixed, we take expectations with respect to Z and use Jensen’s inequality to obtain: EX U (X + Z) ≤ U (EX (X + Z)) = U (X). Furthermore, we apply expectations with respect to X and finally receive EEX U (X + Z) = EU (Y ) ≤ EU (X). (B) ⇒ (A) As F ≤u G, for every concave U we have under definition 5 ∫1 U (x)dS(x) ≤ 0, where ∫1 S = G − F . Using the fact that x and −x are both concave, we receive 0 xdS(X) ≤ 0 ∫1 ∫1 and also 0 −xdS(x) ≤ 0, together implying 0 xdS(X) = 0. By integration by parts, ∫ 0 = 0 1 [ ]1 xdS(x) = xS(x) 0 − ∫ ∫ 1 S(x)dx = 0 0 1 S(x)dx = T (1), 0 which yields the integral condition (2). Consequently, let’s consider special function by (x) = max(y − x, 0) for fixed y. Since −by (x) is concave, we obtain: ∫ y ∫ y ∫ [ ]y 0 ≤ (y − x)dS(x) = yS(y) − xdS(x) = yS(y) − xS(x) 0 + 0 0 10 y S(x)dx = T (y). 0 (A) ⇒ (C) Let’s first consider F and G discrete random variables, defined as follows Pr(X = ai ) = fi and Pr(Y = ai ) = gi , which differ by a single MPS. Using the definition 2, let’s consider points of different probability weights â1 < â2 < â3 < â4 . Consequently, by denoting γk = ĝk − fˆk , we obtain γ1 = −γ2 ≥ 0, γ4 = −γ3 ≥ 0 and 4 ∑ âk γk = 0. k=1 Let’s define matrix C in following way:  1   γ1 (â4 −â2 )  f2 (â4 −â1 ) C =   γ4 (â4 −â3 )  f (â −â )  3 4 1 0  0 0 g2 f2 0 0 g3 f3 0 0 0  γ1 (â2 −â1 )   f2 (â4 −â1 )  . γ4 (â3 −â1 )   f3 (â4 −â1 )  1 Based on the characterization of ≤a for discrete random variables in definition 6, we know that it suffices to show that elements cij of matrix C satisfy conditions (4), (5) and (6). It’s easy to prove that conditions (4) and (5) are met, thus providing that Z defined by cij = Pr(Z = aj − ai |X = ai ) is a random variable and satisfies E(Z|X) = 0. To show that Y =d X + Z (condition (6)), following the approach in original proof, let’s define a discrete variable Y 1 = X + Z. Fact that E(Z) = 0 implies E(Y 1 ) = E(Y ). Moreover, Y 1 may differ from Y only by different probability weight in the points a1 , a2 , a3 and a4 . However, by definition of Z and Y 1 we obtain Pr(Y 1 = a2 ) = Pr(X = a2 ) Pr(Z = 0|X = a2 ) = f2 g2 = Pr(Y = a2 ) f2 and similarly Pr(Y 1 = a3 ) = Pr(Y = a3 ). Therefore the difference between these random variables is possible only in probability of points a1 and a4 . But, as a1 < a4 , an inequality of probabilities would yield an inequality of mean values, which contradicts with a fact E(Y 1 ) = E(Y ). Thus Y =d Y 1 . We use theoretical findings already derived in the paper to extend this result to all cdf’s. First, by lemmas 1 and 4 the implication holds also for the discrete distributions with 11 a finite number of points. Finally, theorem 2 provides for the validity of result for all the cdf’s. Assuming F ≤I G, theorem guarantees an existence of sequences of discrete distributions with finite number of increase points {Fn } and {Gn } such that Fn → F , Gn → G and Fn ≤I Gn , which implies Fn ≤a Gn by the first part of the proof. Let’s denote Hn (x, z) the joint distribution function of the random variables Xn and Zn in the way that if Jn (y) = Pr(Xn + Zn ≤ y), then Fn (x) = Hn (x, 1) andE(Zn |Xn ) = 0. Jn (y) = Gn (y), The last condition can be represented as ∫ 1 1 −1 u(x) z dHn (x, z) = 0 (8) 0 for all continuous functions u(x) on [0, 1]. Let’s denote the expression in the equation (8) as Mn . Since the distribution function Hn is stochastically bounded, a subsequence {Hn′ } of the sequence {Hn } exists such that Hn′ → H(x, z), where H(x, z) is the joint distribution function of X and Z. It’s easy to see that Hn′ (x, 1) → F (x) and Jn′ → G. Moreover, ∫1∫1 ∫1∫1 Mn′ → 0 −1 u(x)zdH(x, z) implying (since Mn′ ≡ 0) 0 −1 u(x)zdH(x, z) = 0 and furthermore E(Z|X) = 0, thus completing the proof. 3.3 Remarks on Mean-Variance Analysis The following part contains several remarks on the risk-comparison approach concerned with the comparison of variances of the random variables.In the section 2.1, we introduced four different concepts to the risk comparison, however the equivalence proven above holds only for three of them, excluding the mean-variance analysis described by ordering ≤v (X ≤v Y if E(X) = E(Y ) and E(X 2 ) ≤ E(Y 2 )). A reason for this is that the relations ≤I , ≤u and ≤a were characterized as the partial orderings, while the mean-variance analysis is a complete ordering. This characteristic is considered to be a disadvantage rather than advantage, as there are examples of random variables X1 and X2 with the same mean such that E(X12 ) < E(X22 ) and E(U (X1 )) < E(U (X2 )) for some nonquadratic concave function. In fact, it can be shown that a function U is quadratic (and convex) if and only if X ≥v Y implies E(U (X)) ≥ E(U (Y )). On the other hand, partiality of the ordering of ≤I , ≤u and ≤a 12 can be demonstrated e.g. by case, where T (y) = ∫y 0 (F (x)−G(x))dx changes sign. In such a case, distributions F and G cannot be ordered. Regarding the mean-variance analysis, Rothschild and Stiglitz noted Tobin’s assumption that such approach may be appropriate for the restricted class of distributions. Authors agree, however they object (see (Rothschild and Stiglitz, 1970, chap. IV.)) that these restrictions are far too severe, allowing only for a changes in distributions from F to G such that F (x) = G(ax + b) for some a > 0, b (compare in Tobin (1965)). 3.4 Previous Literature on Given Topic We conclude this part by remark about the previous literature on the covered topics, as it is basically presented in the paper Rothschild and Stiglitz (1972). As the authors reported, although they considered their result to be the entirely new idea, various sources have proven them wrong. The presented results on the equivalence of risk-comparison approaches, particularly the theorem 3 as a main result of the paper, had been known especially to the mathematical statisticians. For some time, it have already had an important place in a branch od statistical theory called “the comparison of experiments”. As to the examples of such works, these findings are presented e.g. in book by Blackwell and Girschak (Blackwell and Girshick, 1954, chap. 12). Furthermore, more general as well as modern methods can be found in chapter 11 of Mayer (1966) and Strassen (1965). Let’s note here that these references have shown that the equivalence between the ≤a and ≤u orderings holds for general distributions defined over more general spaces than the interval [0, 1], such as the compact subsets of Rn . Unfortunately, ordering ≤I doesn’t seem to provide for such a generalization. 4 Economic Applications In the final part of this seminar work, we review the results presented in Rothschild and Stiglitz (1971), which provide examples of the economic applications of findings derived in Rothschild and Stiglitz (1970). As the authors state, two approaches to investigations of the effect of risk on economic decisions are overviewed here: the effects of increasing risk and choice of a probability distribution. 13 First part offers an alternative approach to the mean-variance analysis regarding the problem of the economic effects of increasing risk. To provide some general framework, let’s assume that an individual chooses a level of some control parameter α to maximize ∫ expected utility U (θ, α)dF (θ), where θ is a random variable. Optimality condition for the variable α is ∫ ∂U (θ, α)dF (θ) = EUα (θ, α) = 0. ∂α (9) Assume further that α∗ is a unique solution of (9) and U is decreasing in α in the neighbourhood of α∗ . Given that Uα (θ, α) is a concave function of θ, our definition of risk comparison (particularly definition 5 concerned with the behavior of all the risk averters described through the concave utility function) implies that an increase in riskiness will decrase α∗ . Similarly, in case that the function Uα (θ, α) is convex in θ, a value of α∗ increases if the uncertainty is bigger. In what follows, we try do apply this idea and decide about the conditions for convexity and concavity of subject functions. As a general conclusion, we show that mean-variance analysis provides results that are misleadingly general, opposing to our approach estabilished by theorem 3. Moreover we show that the Arrow-Pratt concepts of relative and absolute risk aversion provide a convenient approach to inquiry conditions for the convexity or concavity of a given function. After introduction of the main ideas, we present several examples of their possible application in known economic models. In part 4.1 we address the topic of savings and uncertainty. Consequently, part 4.2 is devoted to a portfolio problem with several remarks on more general combined portfolio-savings problem. The last subsection 4.3 deals with a firm’s production problem. To be precise, work Rothschild and Stiglitz (1971) contains two more examples of economic applications, which deal with a multi-stage planning problem in economy and choice of output level for a competitive firm. Although they are quite interesting, we don’t present them in detail because of lack of space. Finally, in the part 4.4 we show an application of the equivalence of three alternative approaches from Rothschild and Stiglitz (1970) (overviewed in part 3.2) to proof some of the general theorems dealing with the situations of the probability distributions choice. 14 4.1 Savings and Uncertainty In the first example we present an analysis of the effect of risk on the savings’ rate of return. An individual wishes to allocate a given wealth W0 between consumption today and tomorrow. Wealth not consumed today is invested and yields the random return e per dollar invested. The expected two-period utility is E [U (C1 ) + (1 − δ)U (C2 )] = U ((1 − s)W0 ) + (1 − δ)EU (sW0 e), (10) with savings rate s and pure rate of time discount δ. We assume that the individual is a risk averter, with the utility function satisfying U ′ > 0 and U ′′ < 0. By setting the derivative of (10) with respect to s equal to zero, we obtain necessary and (as a reason of risk aversion property) also sufficient condition for utility maximization: U ′ ((1 − s)W0 ) = E[U ′ (sW0 e)](1 − δ)e. (11) Intuitively, the increased uncertainty in the return on savings could have two possible outcomes on the savings: they could either drop because “a bird in the hand is worth two in the bush” or grow since risk averter saves more when facing increased unceartainty. Formally, whether bigger risk increases or decreases an optimal level of savings s∗ depends on convexity or concavity of eU ′ (sW0 e) in e. As a result, under the case of increasing risk the level of s∗ grows if 2U ′′ (C) + U ′′′ (C)C > 0 (12) and drops if converse inequality holds. Note that condition U ′′′ (C) ≤ 0 suffices for increasing risk to decrease savings. Applying the Arrow-Pratt concept, we can reformulate these results using relative risk aversion coefficient (R = −CU ′′ /U ′ ). It can be observed that R′ has the same sign as −(U ′′′ C + U ′′ (1 + R)), thus we can state that inequality (12) holds if R is nonincreasing and greater than one. On the other hand, R nondecreasing and less than one provides for opposite inequality. Let’s conclude this example with comment on the application of mean-variance analysis. As we already presented in part 3.3, this approach is equivalent to the assumption that U is quadratic. However, if U (C) = aC − 12 bC 2 , then we can express the RHS of (11) as (1 − δ)(aE(e) − bsW0 E(e2 )), 15 which decreases with E(e2 ) growing. Consequently, s has to drop in order to meet the equality (11). As a result, this approach provides conclusion, which is compatible only with the first argument (growing risk decreases savings) while omitting the second one (savings increase as the variability raises). 4.2 Portfolio Problem and Combined Portfolio-Savings Problem Let’s now move on to address the portfolio problem. Assume that an investor wishes to divide his portfolio between money with zero rate of return and a risky asset with a random rate of return e. If we represent W0 as his initial wealth and α as part of this wealth invested in the risky asset, we obtain for the terminal wealth W (α) = W0 (αe + 1). Again our objective is to maximize the expected utility of terminal wealth EU (W (α)) with the utility function U satisfying the “risk averter” conditions formulated in the previous problem (i.e. U ′ > 0, U ′′ < 0). Let’s denote F the distribution function of e. Then the optimal α has to satisfy first order condition ∫ ′ H(α) = W0 E[U e] = W0 U ′ (W (α))edF (e) = 0. Notice that given the assumptions on the utility function, this condition is necessary as well as sufficient (since H ′ (α) < 0). Let’s again consider change in variability of e. Our question is how the optimal level of α reacts to such a change. Using the mean-variance analysis and the utility function in the form of a quadratic function U (W ) = aW − 21 bW 2 , we receive that α = (a − bW0 )E(e)/E(e2 )bW0 . Thus if e becomes riskier (i.e. E(e2 ) increases with E(e) remaining constant), the optimal level α has to grow. However, this result may not be true in general, though misleadingly presented as a such. This can be observed using approach estabilished by theorem 3. Consider that the distribution of e is changed from F to more variable G with the new ∫ optimal allocation parameter α̃ satisfying U ′ (W (α̃)) e dG(e) = 0. Let’s define a function ∫ S = G − F , then α̃ R α if U ′ (W (α)) e dS(e) R 0. Denote V (e) = U ′ (W (α)) and further assume that F and G have their points of increase confined to the interval (a, b). Now we see that condition ∫ b V (e) e dS(e) ≤ 0 (13) a for all positive and decreasing V and all S satisfying the integral conditions (2) and (3) implies that an increase of variability decreases demand for risky assets by all risk-averse 16 individuals. Moreover, by using (3) and the second mean value theorem of the integral calculus, we obtain a sufficient condition for (13) in a form ∫ c ∀c ∈ (a, b) : e dS(e) = h(c) ≤ 0. a Furthermore, it can be shown that it is also a necessary condition. Otherwise we would ∫b have c such that h(c) > 0. In this case we must have a V (e)dS(e) < 0 for all positive and decreasing V in order to (13) to be satisfied. Now consider   V for a ≤ e < c 1 V = ,  V for c ≤ e ≤ b 2 where V1 > V2 > 0. Then ∫b a V dS(e) > 0, a contradiction. Concerning the statement that the increasing variability decreases the demand for risky assets, authors proclaim that it is possible to show the incrasing concave utility functions, which always satisfy it, and to prove that this type of utility functions doesn’t have a property that increasing risk always increases α. Regarding the application of the Arrow-Pratt concept of risk aversion, let’s first denote Z(e) = eU ′ (W (α)). We can interpret the previous results in a way that concavity of Z(e) implies α̃ ≤ α. Using relative and absolute risk aversion coefficients R = −U ′′ W/U ′ and A = −U ′′ /U ′ , we can express Z ′′ (e) in a following form: Z ′′ (e) = [(1 − R + AW0 )U ′′ + (W0 A′ − R)U ′ ]W0 a. Thus the nondecreasing relative risk aversion less than or equal to one together with the nonicreasing absolute risk aversion are sufficient conditions for the decrease of the share of risky asset caused by the increase of a risk. We conclude this example by the notion of the portfolio-savings problem. In the model we consider an individual who maximizes the expected value of the discounted utility of consumption E ∞ ∑ (1 − δ)t U (Ct ), t=0 where δ represents the discount rate and Ct denotes consumption at time t subject to the stochastic constraints Wt+1 = (Wt − Ct ) rt , 17 where Wt stands for the wealth at time t and rt − 1 is the stochastic rate of return, which consists of the rates of return of two assets given by expression rt = αr1t + (1 − α)r2t . Parameter α is fraction invested in the first asset and r1t and r2t represent the rate of returns of asset 1 and asset 2 respectively. We again ask what effect will an increase in risk of the one of the assets’ return have on portfolio allocation and savings. Although it seems reasonable that an increase in the variance of one asset decreases the proportion invested in this asset, it can be shown that under special conditions an increase in variability could have the opposite effect. We can also analyze an effect on the savings rate. Considering the CRRA utility function U (C) = C 1−a /(1 − a) (for a > 0, a ̸= 1), we obtain that if a < 1, then an increase in the variability of r increases the savings rate, while a > 1 provides for an opposite result. 4.3 Firm’s Production’s Problem As a last example presented in this part on the effects of increased risk, we overview a problem of production setting. Let’s consider firm with uncertain output Q over next period. A goal of the firm is to minimize the expected cost of production. Assume further a two-factor concave production function P (K, L), which represents production process, i.e. Q = P (K, L). K represents capital, which cannot be varied in the short run, and L stands for labor, which, on the contrary, is variable. The expected costs of production are given by expression E[rK + wL(K, Q)] = rK + wE[L(K, Q)], (14) where r is the cost of capital, w cost of labor and L(K, Q) stands for the level of the labor, which is required to produce Q with capital K. Our question is what happens to the expected costs as the variability of Q increases. To answer it we use the fact that L(K, Q) is convex in Q for any given level of K (this is implied by the concavity of F ). Therefore using our approach given by definition 5 we obtain that higher variability of Q always results in the higher expected costs. Consequent problem, which si more difficult to answer, addresses the reaction of the optimum level of K to the increase of variability of Q. Authors point out that the answer 18 is related to the elasticity of substituion between K and L. Let’s start with a derivation of the first order conditions from (14): [ ] ∂L(Q, K) r = E , w ∂K which can be interpreted in a way that the factor-price ratio must be equal the mean value (or the average) or the marginal rate of substitutions. We conclude by two examples of particular production function. First we consider the production function with a constant elasticity of substitution ( )1 Q(K, L) = δK ρ + (1 − δ)Lρ ρ . It can be shown that condition ρ ≤ 0 (or equivalently the elasticity of substitution less than or equal one) implies convexity of ∂L/∂K with respect to Q, meaning that increase of Q variability causes rise in the optimal level of K. As a second example, we look at the production function with infinite elasticity Q(K, L) = bK + aL. Let’s denote G(Q) the distribution function for Q. Then it can be shown (for details see (Rothschild and Stiglitz, 1971, p. 79)) that behavior of K regarding the increase of variability of Q depends on the term G−1 (1 − (ar/wb)). To be more specific, optimal level of K increases if G−1 (1 − (ar/wb)) rises, or (equivalently) if probability that Q exceeds bK increases. 4.4 Choosing a Probability Distribution Finally, we address several examples of the application of our theoretical results (the definition of variability and basic theorem on the equivalence) to prove some general theorems dealing with the choice of a probability distribution from the set of possible probability distribution. We start with the diversification theorem. Consider an individual, who can allocate his given initial wealth between two securities. Their values next period e1 and e2 (per dollar invested) are assumed to have identical and independent distributions. An investor chooses b to maximize the expected utility EU (W ) = EU ((be1 + (1 − b)e2 )W ), 19 where U is a concave function. The diversification theorem states, that optimal b holds b= 1 2 independently of the utility function. To prove this statement, let’s define yb = (be1 +(1−b)e2 )W0 . Note that we can reformulate yb = y1/2 − (b − 1/2)(e1 − e2 )W0 and furthermore E(e1 − e2 |y1/2 ) = 0. This by definition 6 provides that y1/2 ≤a yb . Using the theorem 3 we receive y1/2 ≤u yb , therefore all individuals with concave utility functions prefer y1/2 to yb . Second, and final presented example deals with the Rao-Blackwell Theorem. Assume a random distribution depending on an unknown parameter θ and consider a sample of random variables x = (x1 , . . . , xn ) generated from this distribution. Furthermore consider that the criteria for the choice of estimator d(x) of parameter θ depending on the sample x is the minimization of the expected value of a convex loss function L(d(x)). The RaoBlackwell theorem states that for any estimator d(x) and any L that is convex, the existence of sufficient statistic T for θ implies an existence of the estimator d∗ at least as good as d(x) in the sense that EL(d∗ (x)) ≤ EL(d(x)). To prove this, let’s define d∗ (x) = E(d(x)|T ) for every T . It’s easy to see that we have to prove d∗ (x) ≤u d(x), which is, by theorem 3, equivalent with d∗ (x) ≤a d(x). Consider the random variable z defined by equation d(x) = d∗ (x) + z. By definition of d∗ it holds that E(z|T, x) = E(z|d∗ ) = 0. Thus we may conclude (by definition 6) that d∗ (x) ≤a d(x). 5 Conclusion An aim of this seminar work is to provide a summary of the interesting series of papers by Rothschild and Stiglitz (1970, 1971, 1972). We start with the section on theoretical background, introducing four different approaches generally used to compare variability of the random variables. In particular, we in depth explain and provide a formalization for the concept of comparison of “the weight in the tails” of the random variables’ distribution. In the second part, we focus on the equivalence theorem as a main result of this series of articles. After the introduction of the partial ordering and definition of three variabilitycomparison approaches within this framework, we provide the theorem itself along with the complete proof. A contribution of these theoretical finding within this framework is that it states an equivalence of different perspectives on the issue of comparison of risk, 20 thus providing basis for the convenient definition of greater variability. The final section is devoted to the examples of economic problems as potential applications of derived results. We present simple models on savings, portfolio allocation and firm’s choice of production level. In each of these examples, we address the question of impact of increased variability on the optimal levels of variables in the model. We show that solutions given by the mean-variance analysis can be often misleading, omitting possibility of different outcomes other than the one single result. Finally, we apply the results to prove some general theorems dealing with the choice of the probability distribution. To conclude, the original works are some of the most essential regarding the economic problem of risk assessment and comparison. Although, as the authors themselves admit, the presented theoretical findings have been known before and therefore aren’t completely new, they provide a complex theoretical background and thorough insight on the given issues. References Blackwell, D. and M. A. Girshick (1954). Theory of Games and Statistical Decisions. Wiley, New York. Mayer, P. A. (1966). Probability and Potentials. Blaisdell, Waltham, Ma. Rothschild, M. and J. E. Stiglitz (1970). Increasing Risk: I. A Definition. Journal of Economic Theory 2 (3), 225–243. Rothschild, M. and J. E. Stiglitz (1971). Increasing Risk II: Its Economic Consequences. Journal of Economic Theory 3 (1), 66–84. Rothschild, M. and J. E. Stiglitz (1972). Addendum to ”Increasing Risk: I. A Definition”. Journal of Economic Theory 5 (2), 306–306. Strassen, V. (1965). The Existence of Probability Measures with Given Marginals. The Annals of Mathematical Statistics 36 (2), 423–439. Tobin, J. (1965). The Theory of Portfolio Selection. In F. Hahn and F. Brechling (Eds.), The Theory of Interest Rates. MacMillan, London. 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Summary of the papers on ”Increasing risk” by Rothschild and Stiglitz