Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of Electronic Commerce June 13, 2007 Lecture 9 Lecturer: Arrow Kenneth 1 Scribe: Dan Yadlin and Barak Perelman Introduction This lecture will discuss decision making in states of partial information. • Denote A to be the collection of possible actions for the player to choose from. • Denote X be the collection of all possible ”States of nature”, meaning all possible values of variables uncontrolled by the player. X can be seen as a random variable. • Let w : A × X −→ R be the wealth function. w(a, x) is the player’s payment if he selected action a ∈ A and the state of nature is x ∈ X. Definition 1 Optimal choice of action under the above notations will be: a = arg max Ex∈X [w(a, x)] a∈A From now on our goal will be to examine how to achive that optimal choice. Example 1 Consider a situation in which you decide on the contents of your portfolio. The portfolio may include securities for each x ∈ X. Each security is purchased for a constant price (say 1) for a specific x ∈ X. If that x is the measurement of X, then the owner of the security gets an amount of money euqal to rx , otherwise he gets nothing for his security. Denote a(x) as the amount P of securities bought for the state of x. Therefore, the total amount of money invested is x a(x) = A. The terminal wealth will then be w(a, x) = a(x) · rx . Our utility function will then evaluate to u(a, x) = log w(a, x) = log rx + log a(x). This utility function (log scaled) expresses our intuition about the ”happiness” achived by earning money. An additional 10$ will make us a lot happier if we have only earned 100$ , while it will make very little difference if we have alerady earned 100,000$. Due to all of the above, our goal is as always to maximize the expectancy of utility (and not of terminal wealth). We will choose our investment strategy that will maximize the following expression: a = arg max Ex∈X [u(a, x)] = arg max {Ex∈X [log rx ] + Ex∈X [log a(x)]} a∈A a∈A We can see that the first expression is independent P of a, and therefore, maximizing the utility is equivalent to maximizing the expression x p(x) · log a(x). 9-1 2 Proper scoring We will name the proccess of choosing the payments of a game, so that no player will have an incentive to lie - proper scoring. We’ll address this issue by giving an example. First we will present a lemma. Lemma 1 (Gibb’s Lemma) Let p(x), q(x) be two distributions on X = {x1 , ..., xn }, then: X X p(x) · log p(x) ≥ p(x) · log q(x) x∈X x∈X Proof of Gibb’s Lemma: a Since log2 a = ln ln 2 , it is sufficient to prove the statement using the natural logarithm (ln). Note that the natural logarithm satisfies ln x ≤ x − 1. Let I denote the set of all i for which pi 6= 0. Then: ¶ X X µ qi X X X qi − pi ln ≥ − pi −1 =− qi + pi = − qi + 1 ≥ 0 pi pi i∈I i∈I i∈I So − And then trivially − X pi ln qi ≥ − X i∈I i∈I n X n X pi ln qi ≥ − i=1 i∈I i∈I pi ln pi pi ln pi i=1 Let’s examine the situation where one wants to know tomorrow forecast. We assume that the weatherman knows exactly the probability for rain. In order to get him to tell us the truth, we will choose the payment scheme as following: We will pay the weatherman the next day, according to the actual weather that day. i.e: If it was a rainy day, and the weatheman said that the probability for rain is p, then he will get paid log p (otherwise, In the basic case of 2 possibilities, he will get paid log 1 − p). The expected payment for each day, assuming p is the actual probability for rain, is p · log p + (1 − p) · log 1 − p. Lets assume that the weatherman declares a false probability for rain q 6= p. This time, the expected payment will be p · log q + (1 − p) · log 1 − q, and according to Gibb’s lemma, for every q, this expectency is worse than the original one. To conclude, in this payment scheme, the weatherman does not have an incentive to lie, therefore it has the proper scoring property. 3 Shannon measurement of information Given a discrete random variable X that can take possible values {x1 , ..., xn } and its distribution p(x), we want to find a measure for the uncertinty in X which depends only on 9-2 the distribution and not on the possible values of X. The function Hn (X) (which will be called the entropy of X) must comply with the following requirements: 1. Hn (p(x1 ), ..., p(xn )) = Hn+1 (p(x1 ), ..., p(xn ), 0) 1 1 , ..., n+1 ) 2. Hn ( n1 , ..., n1 ) < Hn+1 ( n+1 3. H(X, Y ) = H(X) + EX [H(Y |X)] Fact 1 There is only one function(ignoring multiplication by a constant) that complies with all of the above, and that is Shannon’s measurement of information entropy: H(X) = − n X p(xi ) log2 p(xi ) i=1 Reason for choosing the base of the logrithm: Choosing the binary base for the logrithm in the definition gives us another property of H(X), it is the average amout of bits used to represent each value xi in the optimal coding scheme. Observation 1 H(X) is always non-negative because log(p(x)) ≤ 0 and that is because p(x) ≤ 1. Observation 2 H(X) = 0 iff p(x) = 1 for some x, meaning, the outcome of the measurement of X is already known. Observation 3 H(X) is maximal when X is uniformly distributed, and its value is log(n). Let’s recall example 1 where we talked about securities. Lets take a look at the resonable invester, who distributes his invesments according to the distribution of X, so he invests in the way: a(x) = A · p(x). Let’s figure out what will be his expected terminal wealth. Like we already saw in the previous section, the expectency of the utility is: Ex∈X [u(a, x)] = Ex∈X [log rx ] + Ex∈X [log a(x)] = Ex∈X [log rx ] + log A + Ex∈X [log p(x)] = Ex∈X [log rx ] + log A − H(X) And already mentioned that the unitilty function is log of the terminal wealth, so Ex∈X [w(a, x)] = Gr Ae−H(X) while Gr = eE(log rx ) Now let us discuss the price of information. Let’s assume that the real value of X is available for some price C. When X is known, then H(X) = 0, and the teminal wealth expectency becomes Gr(A − C). Denote V as the actual value of the knowledge of X. We can calculate V by equalization of the apriori expected teminal wealth (without the knowledge of X) with 9-3 the expected teminal wealth after knowing X and paying V for the info). Then we get the following formula Gr (A − V ) = Gr Ae−H(X) We can conclude from the above, that V is proportional to A. This means that if one has more money to begin with (for investing), then he should be willing to pay more for the same information than an investor with less money. 4 Decision making with partial information Let’s look at the general case where partial information about X is available (not all of it, nor not none of it). Denote S to be a random variable which holds some information about X. We call S a Signal. A signal can be for example the earnings of a company last month, giving us a hint on what will be the earnings of the same company next month. We will take interest in the distribution of the probability function p(X|S). Now our action a is a function: a = a(s), and now we are looking to maximize the following expression: arg max EX|S [w(a, x)] or arg max EX,S [w(a(s), x)] a∈A a∈A Lets assume that there is a set of signals S(X) from which we can choose a single signal. A signal is represented by the distribution of p(S|X). Using Bayes’ theorem, we can calculate the needed distribution P (X|S): p(S|x) p(x) 0 0 x0 ∈X P (S|x ) P (x ) p(x|S) = P Obviously, the terminal wealth will now depend on S, so how do we choose the optimal signal from S? · ¸ £ ¤ S = arg max max EX|S [w(a(s), x)] S∈S(X) 5 a(S) Signal comparison Definition 2 Let S, S 0 ∈ S(X) be two signals. We call S’ a garbling of S if one of these apply: • ∀X, S, S 0 : p(X|S, S 0 ) = p(X|S) • ∀X, S, S 0 : p(S, S 0 |X) = p(S|X) · p(S 0 |S) • ∀X, S, S 0 : p(S 0 |S, X) = p(S 0 |S) Claim 1 If we define the following matrices: 9-4 • Li,j = p(S = j|X = i) • L0i,j = p(S 0 = j|X = i) • Gi,j = p(S 0 = j|S = i) Then L0 = LG if S 0 is a garbling of S. Note that L, L0 define the distributions of p(S|X) and p(S 0 |X) accordingly. We will denote L to be the matrix of the signal S. Definition 3 A markov matrix is a square matrix whose rows consists of non-negative real numbers, with each row summing to 1. Definition 4 Let L, L0 be the matrices of the signals S, S 0 accordingly. S 0 is called a quasigarbling of S if there exists a markov matrix M such that L0 = LM . Definition 5 A signal S is considered at least as informative as S 0 , if for every action space A and every payoff function w(a, x), the optimal decision function based on S is at least as good as the optimal decision based on S 0 . Theorem 2 (Blackwell theorem) S is at least as informative as S 0 iff S 0 is a quasigarbling of S Observation 4 If S is at least as informative as S 0 and S 0 is at least as informative as S, then the information that can be extracted from both of them is equivalent. Notice that informativeness is a partial order (because two signals can fulfill that none of them is more informative than the other). We would like to find a total order on signals. Definition 6 The rate of transmission (mutual information) of S towards X is defined to be: R(X|S) = H(X) − ES [H(X|S)] Meaning, how much ”extra” information does S give us on X, according to Shannon’s entropy. Claim 2 Let S and S 0 be signals. Assume S is at least as informative as S 0 . Then R(X|S) ≥ R(X|S 0 ). 9-5