Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Savage’s 3. THEORY OF RATIONAL DECISION (Experts in Uncertainty ch 6) Roger M Cooke ShortCourse on Expert Judgment National Aerospace Institute April 15,16, 2008 1. Basic Concepts S: the set of possible worlds or set of states of the world C: the set of consequences or states of an acting subject F: set of available acts ≤: preference relation on available acts Where; S <> #C < ∞ F = Cs = {f | f: S → C} F × F. Write f ~ g if f ≤ g and f ≥ g; and f > g if f ≥ g and not f ≤ g. Example available acts: f = "buy a car" g = "buy a motorcycle" factors influencing the outcomes of acts: N: get new job further from home 1 A: have a driving accident R: price of gasoline rises N'; not get new job further from home ect. (reduced) states of the world: NAR NA'R NAR' NA'R' N'AR N'A'R N'AR' N'A'R' Idea: State of world is not known with certainty; hence outcomes of acts are uncertain. The value of the outcome of an act can vary in different circumstances. The value of an act is to be represented as expect utility: EU(f) = ∑cC U(c) ×P{sS | f(s) = c} Further, we want: * f ≥ g if and only if U(f) ≥ U(g). Problem: Is P independent of f and/or g?? Is P(A) the same regardless whether we buy a car or a motorcycle? If this does not hold, then the problem has not been modeled properly for representation in Savage's formalism. Technical fix: Can always be applied to yield a model in which probability is independent of act which is performed. E.G. if probability of A is influenced by choice between f and g, then write A = Af Ag; Af = would have an accident if driving a car Ag = would have an accident if driving a motorcycle. Now repeat the analysis with events N, Af, Ag, R. For representation * to go through, we must assign probabilities to events, and numerical utilities to outcomes. We seek conditions on ≥ which - are true for all "rational" preference behavior - admit * for all available acts - the P in * is unique and the U is unique 2 up to U' = aU+b, a > 0. 2. Axioms 1.Weak Order: f,g F, f ≥g or g ≥ f or both (≥ is connected) f,g,h F, if f ≥ g, and g ≥ h, then f ≥ h (transitivity) n.b. if c C, then we may let c F denote the 'constant act" c(s) = c. If c,d C, then we may write 'c > d' if this holds for the corresponding constant acts. 2. Principle of definition g,b C such that g > b (g = 'good', b = 'bad'); A S, define: rA(s) = g if s A; and = b otherwise. rA F; If g*, b* C with g* > b*, let r*A (s) =g* if s A; and = b* otherwise. then A,B, S; rA > rB if and only if r*A > r*B. Definition 1: For A, B S, A ≥. B if rA ≥ rB. We write >. and <. with the obvious meaning. 3. Sure thing principle A S, let f,h,f*, h* F such that f(A) = f*(A) h(A) = h*(A) f(A') = h(A') f*(A') = h*(A') then f h if and only if f* h*. 3 Allais paradox Choice 1 Act 1 get $500,000 Act 2 get $2,500,000 with probability 0.1 get $500,000 with probability 0.89 get 0 with probability 0.01 Choice 2 Act 3 get $500,000 with probability 0.11 get 0 with probability 0.89 Act 4 get $2,500,000 with probability 0.1 Get 0 with probability 0.9. Many people say Act 1 > Act 2, but Act 3 < Act 4. Lemma 1; '≥.' is additive that is; A,B,C S, if AC = BC = , then A ≥. B if and only if AC ≥. BC Proof: Exercise (hint, use the sure thing principle). Definition 2: A S is null if rA ~ r 4. Dominance: If sS, f(s) ≥ g(s), then f ≥ g. If also f(s) > g(s) for all s B, B non-null, then f > g. Lemma 2: A S. ≤. A ≤. S. Proof: exercise. Lemma 3: A,B S, A ≥. B if and only if B' ≥. A'. Proof: exercise. Verify: A,B,C S with AC = , if A ≥. B then AC ≥. BC. find a counter example for the converse. 5. Refinement B,C S, if B <. C then there exists a partition {Ai}i=1..n; such that 4 B Ai <. C; for i = 1,...n. Definition 3: a relation <* on PS × PS is a qualitative probability if 1) <* is a weak order 2) <* is additive 3) A S, * A * S. Lemma 4: The relation '>.' from definition 1 is a qualitative probability. Proof: exercise. 3. Probability Theorem 1. If > on F F satisfies axioms 1 through 5, then there exists a unique probability P on (S, P(S)), such that for all A, B S, P(A) P(B) if and only if A . B. Proof sketch We use Lemma 5. AS there exist A1, A2 A such that A1A2 = , A1A2 = A and A1=. A2 . Proof: see supplement Chpt. 6. With this we can show that for all n there exists a uniform partition {Ai}i=1...2n, i.e. a partition such that Ai ~. Aj, i,j=1...2n. Pick one such uniform partition {Ai}i=1...2n; and let k(A,n) be the largest integer k such that k Ai ≤. A. i=1 Claim, k(A,n) is the same for any 2n-fold uniform partition. Proof: The proof is by induction on n; we illustrate for n = 2. Suppose {A1,A2} and {B1,B2} are uniform partitions; then we show Bi =. Ai. Suppose to the contrary A1 >. B1, then A2 =. A1 >. B1 =. B2. But by Lemma 3, B2 = B1' >. A1' = A2. 5 k(A, 2n)/2n is an increasing function of n, and is < 1; therefore the limit exists and we define: P(A) := limn->∞ k(A,2n)/2n Verify; P is a probability. We show that for all A, B, P(A) ≥ P(B) if and only if A ≥. B. Suppose A ≥. B, then also k(A,n) ≥ k(B,n), for all n, so that P(A) ≥ P(B). Suppose that A >. B, then there exists a partition {Ci}i=1...n* such that BCi <. A, for each i = 1,...n*. Pick j such that <. B’Cj; and pick partition {Dj}j=1...n' such that Dj <. B’Cj. We can find a uniform partition {Ai}i=1...2n such that Ai <. Max {Dj} i. For this partition we can show k(B,2n) < k(B,2n)+1 k(BCj, 2n) ≤ k(A,2n) If we consider a 2n+m uniform partition then k(B,2n+m) < k(B,2n+m)+2m k(A,2n+m) P(B)= limm [k(B,2n+m)+2m] / 2(n+m) P(B)+ 2-n < P(A). Verify: The P defined above is unique, that is, if P* such that A,B S, P*(A) ≥ P*(B) if and only if A ≥. B then P* = P. Lemma 6: If '' satisfies axioms 1 through 5, then #S = ∞. Moreover, for any r (0,1) and A S there is Ar A such that P(Ar) = rP(A). (See Experts in Uncertanty ch. 6). 4. UTILITY Note that the foregoing derivation of P used the 'utility' of consequences only in the Principle of Definition. We need a utility function for the representation of preference. For this purpose we need one more axiom on the preference relation, and this axiom explicitly uses the P which are derived from the previous five axioms: 6. Strengthend refinement principle: f,g F, if f > g, then there exists an α, α > 0, such that for all AS with P(A) < α, and for all f', g' with f(A') = f'(A'); g(A') = g'(A') 6 we have f' > g'. Theorem 2: If '' satisfies axioms 1 through 6, then 1) There exists a U*: C → R such that f,g F f > g U*(f) > U*(g) where U*(f) = ΣcC U*(c)P(sS|f(s) = c}, etc. 2) If U: C → R is another function for which f > g U(f) > U(g), then there exist a,b R, a > 0, such that U = aU* + b. Proof sketch: 1) Order C such that cn > cn-1 >...c1. Put U*(cn) = 1, U*(c1) = 0. For each event A we can consider the act cn|A c1|A'. Lemma 1: For ci, we can find Ai such that ci ~ cn|Ai + c1|Ai’ ; moreover, any B with P(B) = P(Ai) will satisfy the above equation in preference (which compares the constant act ci with another act). Lemma 2: For each f, we have f ~ f where f= c1|f-1(c1)B2..Bn-1) + cn|f-1(cn)G2...Gn-1) where BiGi= f-1(ci);BiGi = ; Gi = f-1(ci)P(Ai); i = 2,...n-1; for the Ai from Lemma 1. Proof (see Experts in Uncertainty chapt. 6) 2). Verify: For i = 2,...n-1, putting U*(ci) = P(Ai); U* satisfies (1) of the theorem. Proof sketch: Use the notation Gi(f), Gi(g), as in Lemma 2, and put G(f) = Gi(f); G(g) = Gi(g). 7 Verify that: f ≥ g f ≥ g G(f) ≥. G(g) rG(f) ≥ rG(g) P(G(f)) ≥ P(G(g)) U*(f) ≥ U*(g). 3) Verify: that if U satisfies (1) of the theorem then so does aU+b, with a,b R, a > 0. 4) Suppose U satisfies (1) of the theorem. Choose a, b such that aU(cn) + b = 1; aU(c1) + b = 0. We show that aU + b = U*. Indeed, since aU+b satisfies (1): (aU + b)(ci) = (aU+b)(cn1Ai + c11Ai’) = (aU(cn)+b)P(Ai) = P(Ai) = U*(ci)) 5. Observation In light of the previous theorem we may simply assmue that our acts are utility valued, and we write Ef instead of U(f) for the expected utility of f. Uncertainty concerns the outcomes of potential observations, uncertainty is reduced or removed by observation. A representation of uncertainty must also represent the role of observations, and explain the value of observation for decision making. For a subjectivist the question "why observe" is not trivial. Isn't one subjective probability just as rational as every other? The value of observation does not devolve from the "rationality" of the preference relation as defined by the axioms 1 through 6. Rather it depends on the acts at our disposal and on the utilities of consequences. The set F contains all functions from S to C. One of these acts takes the best consequence in all possible worlds. If this act is also available to us, there is no reason to observe anything. Example Suppose we are considering whether to invest in gold. There are two acts at our disposal, invest or not invest. Suppose we restrict attention to one uncertain event, namely the event that the price of gold rises next year. Let B1 denote the event "the price of gold rises next year" and let B2 = B1'. Suppose before choosing whether to invest, we can perform a cost-free observation. For example, we may ask an expert what (s)he thinks the gold price next year will be. The result of this observation is of course uncertain before it is performed. An observation must therefore be represented as a random variable X: S R. Let x1,...xn denote the possible values of X. Before performing the observation, our degree of belief in B1 is P(B1). After the observing X = x i, our probability is P(B1|xi). If the observation of xi is definitive, then P(B1|xi) = 0 or 1. We assume that none of the possible values xi are definitive. By Bayes' theorem: 8 P(B1|xi) = P(xi|B1)P(B1)/P(xi). Dividing by a similar equation for B2: P(B1|xi) ────── = P(B2|xi) P(xi|B1)P(B1)/P(xi) ─────────── . P(xi|B2)P(B2)/P(xi) The idea is this; if B1 is the case, then we should expect that after the observation of X, the LHS > P(B1)/P(B2). Define R(X) = P(X|B1) ─────. P(X|B2) In other words, E(R(X)|B1) > 1. To prove this we recall Jensen's inequality. Definition: A function q: D → R, D an interval, is convex (concave) if α(0,1), and x,y D: q(αx + (1-α)y) ≤(≥) αq(x) + (1-α)q(y). q is strictly convex (concave) if the inequality above is strict. Jensen's inequality: If X is an random variable, and q: D → R, D an interval, D Rng(X), is convex (concave), then E(q(X)) ≥ (≤) q(E(X)), with strict inequality if q is strictly convex (concave). Theorem 3: E(R(X)|B1) > 1, and E(R(X)|B1) = 1 if and only if P(xi|B1)/P(xi|B2) = 1 for i = 1,...n. Proof: log(P) is concave, and -log(P) is convex. Hence log(E(R(X)|B1)) ≥ E(logR(X)|B1) = E(-logR(X)-1| B1) ≥ -log(E(R(X)-1|B1)). Further 9 P(xi|B2) E(R(X) | B1) = Σxi -1 P(xi | B1) = 1 P(xi|B1) so that -log(E(R(X)-1| B1)) = -log(1) = 0. log(E(R(X)|B1)) ≥ 0; E(R(X)|B1) ≥ 1. Since log is strictly concave, log(E(R(X)|B1)) = E(logR(X)|B1) only if R(X) is constant, given B1. That is, if s B1, then P(Xi(s)|B1) = KP(Xi(s)|B2). Summing this equation over i = 1,...n, and recalling that no observation is definitive, we find that 1 = K. This theorem says that we expect to become more certain of B1, if B1 is the case, by observing X. It shows how we expect our beliefs to change as a result of observation, but it does not answer the question why observe? To answer this question we must consider the value of an observation is determined by the acts at our disposal. Suppose F* F is a set of available acts, and we have to choose a single element of F*. Definition: For F* F the value of v(F*) of F* is v(F*) = maxfF* Ef. Before choosing an element of F*, we may consider the option of observing X. After observing X = xi our probability changes from P(•) to P(•|X=xi), and the value of F* would be computed with respect P(•|X=xi). Before observing we don't know which value of X will be observed, but we can consider the value of F* as conditional on the value of X. Recall that the consequences are assumed to be utility values. Definition: For F* F the value of v(F*|X=xi) of F* given X = xi is v(F*|X=xi) = maxfF* ΣcC c P(f-1(c)|X=xi)) = maxfF* E(f|xi); The value of v(F*|X) of F* given X is v(F*|X) = Σ v(F*|X=xi)P(X=xi). Theorem 4: For any X: S → R, and any F* F, v(F*|X) ≥ v(F*). 10 Proof: v(F*|X) = Σ maxfF* [E(f|xi)] P(X=xi) ≥ Σ E(f|xi)P(X=xi) = Ef. Since this holds fF*, we have v(F*|X) ≥ maxfF* Ef = v(F*). Exercise: Let X take two possible values, 1, and 2; let pi = P(B1|i), i = 1,2. Let F* be the available acts, and suppose that if f e F*, then f is constant on B1 and on B2 = B1'. Let p = P(B1), and let v(F*)(p) denote the value of F as a function of p. • Show that the expectation of f F* is a linear function of p. • Show that v(F*)(p) is convex. • Show that if p1 = p2, then v(F*|X) = v(F*) • Show that if v(F*)(p) is linear in the interval (p1,p2), then v(F*|X) = v(F*). • Let X' be another two-valued observation with p1' and p2'. Show that if |p1' - p2'| > |p1 - p2|, then v(F*|X') > v(F*|X). Example Value of problem F 15 Exp(act(p)) 10 f 5 g 0 -5 0 0.2 0.4 0.6 0.8 1 h j -10 V(F) -15 -20 probability f(p) = 20p-12 g(p) = -30p+13 h(p) = 2p j(p) = -3p-2 11 Value of Problem Value of problem F Value of Perfect Information Value of Observation X 15 Exp(act(p)) 10 f 5 g 0 -5 0 0.2 0.4 0.6 0.8 1 h j -10 V(F) -15 -20 P(A|x1)=0.25 probability 12 P(A|x2)=0.7