Download 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microeconomics wikipedia , lookup

Choice modelling wikipedia , lookup

Transcript
Foundations of Electronic Commerce
June 13, 2007
Lecture 9
Lecturer: Arrow Kenneth
1
Scribe: Dan Yadlin and Barak Perelman
Introduction
This lecture will discuss decision making in states of partial information.
• Denote A to be the collection of possible actions for the player to choose from.
• Denote X be the collection of all possible ”States of nature”, meaning all possible
values of variables uncontrolled by the player. X can be seen as a random variable.
• Let w : A × X −→ R be the wealth function. w(a, x) is the player’s payment if he
selected action a ∈ A and the state of nature is x ∈ X.
Definition 1 Optimal choice of action under the above notations will be:
a = arg max Ex∈X [w(a, x)]
a∈A
From now on our goal will be to examine how to achive that optimal choice.
Example 1 Consider a situation in which you decide on the contents of your portfolio.
The portfolio may include securities for each x ∈ X. Each security is purchased for a
constant price (say 1) for a specific x ∈ X. If that x is the measurement of X, then the
owner of the security gets an amount of money euqal to rx , otherwise he gets nothing for his
security. Denote a(x) as the amount
P of securities bought for the state of x. Therefore, the
total amount of money invested is x a(x) = A. The terminal wealth will then be w(a, x) =
a(x) · rx . Our utility function will then evaluate to u(a, x) = log w(a, x) = log rx + log a(x).
This utility function (log scaled) expresses our intuition about the ”happiness” achived by
earning money. An additional 10$ will make us a lot happier if we have only earned 100$
, while it will make very little difference if we have alerady earned 100,000$. Due to all of
the above, our goal is as always to maximize the expectancy of utility (and not of terminal
wealth). We will choose our investment strategy that will maximize the following expression:
a = arg max Ex∈X [u(a, x)] = arg max {Ex∈X [log rx ] + Ex∈X [log a(x)]}
a∈A
a∈A
We can see that the first expression is independent
P of a, and therefore, maximizing the
utility is equivalent to maximizing the expression x p(x) · log a(x).
9-1
2
Proper scoring
We will name the proccess of choosing the payments of a game, so that no player will have
an incentive to lie - proper scoring. We’ll address this issue by giving an example. First we
will present a lemma.
Lemma 1 (Gibb’s Lemma) Let p(x), q(x) be two distributions on X = {x1 , ..., xn }, then:
X
X
p(x) · log p(x) ≥
p(x) · log q(x)
x∈X
x∈X
Proof of Gibb’s Lemma:
a
Since log2 a = ln
ln 2 , it is sufficient to prove the statement using the natural logarithm (ln).
Note that the natural logarithm satisfies ln x ≤ x − 1.
Let I denote the set of all i for which pi 6= 0. Then:
¶
X
X µ qi
X
X
X
qi
−
pi ln ≥ −
pi
−1 =−
qi +
pi = −
qi + 1 ≥ 0
pi
pi
i∈I
i∈I
i∈I
So
−
And then trivially
−
X
pi ln qi ≥ −
X
i∈I
i∈I
n
X
n
X
pi ln qi ≥ −
i=1
i∈I
i∈I
pi ln pi
pi ln pi
i=1
Let’s examine the situation where one wants to know tomorrow forecast. We assume that
the weatherman knows exactly the probability for rain. In order to get him to tell us the
truth, we will choose the payment scheme as following: We will pay the weatherman the
next day, according to the actual weather that day. i.e: If it was a rainy day, and the
weatheman said that the probability for rain is p, then he will get paid log p (otherwise, In
the basic case of 2 possibilities, he will get paid log 1 − p). The expected payment for each
day, assuming p is the actual probability for rain, is p · log p + (1 − p) · log 1 − p. Lets assume
that the weatherman declares a false probability for rain q 6= p. This time, the expected
payment will be p · log q + (1 − p) · log 1 − q, and according to Gibb’s lemma, for every q,
this expectency is worse than the original one. To conclude, in this payment scheme, the
weatherman does not have an incentive to lie, therefore it has the proper scoring property.
3
Shannon measurement of information
Given a discrete random variable X that can take possible values {x1 , ..., xn } and its distribution p(x), we want to find a measure for the uncertinty in X which depends only on
9-2
the distribution and not on the possible values of X. The function Hn (X) (which will be
called the entropy of X) must comply with the following requirements:
1. Hn (p(x1 ), ..., p(xn )) = Hn+1 (p(x1 ), ..., p(xn ), 0)
1
1
, ..., n+1
)
2. Hn ( n1 , ..., n1 ) < Hn+1 ( n+1
3. H(X, Y ) = H(X) + EX [H(Y |X)]
Fact 1 There is only one function(ignoring multiplication by a constant) that complies with
all of the above, and that is Shannon’s measurement of information entropy:
H(X) = −
n
X
p(xi ) log2 p(xi )
i=1
Reason for choosing the base of the logrithm: Choosing the binary base for the
logrithm in the definition gives us another property of H(X), it is the average amout of
bits used to represent each value xi in the optimal coding scheme.
Observation 1 H(X) is always non-negative because log(p(x)) ≤ 0 and that is because
p(x) ≤ 1.
Observation 2 H(X) = 0 iff p(x) = 1 for some x, meaning, the outcome of the measurement of X is already known.
Observation 3 H(X) is maximal when X is uniformly distributed, and its value is log(n).
Let’s recall example 1 where we talked about securities. Lets take a look at the resonable
invester, who distributes his invesments according to the distribution of X, so he invests in
the way: a(x) = A · p(x). Let’s figure out what will be his expected terminal wealth. Like
we already saw in the previous section, the expectency of the utility is:
Ex∈X [u(a, x)] = Ex∈X [log rx ] + Ex∈X [log a(x)] =
Ex∈X [log rx ] + log A + Ex∈X [log p(x)] = Ex∈X [log rx ] + log A − H(X)
And already mentioned that the unitilty function is log of the terminal wealth, so
Ex∈X [w(a, x)] = Gr Ae−H(X)
while
Gr = eE(log rx )
Now let us discuss the price of information. Let’s assume that the real value of X is available
for some price C. When X is known, then H(X) = 0, and the teminal wealth expectency
becomes Gr(A − C). Denote V as the actual value of the knowledge of X. We can calculate
V by equalization of the apriori expected teminal wealth (without the knowledge of X) with
9-3
the expected teminal wealth after knowing X and paying V for the info). Then we get the
following formula
Gr (A − V ) = Gr Ae−H(X)
We can conclude from the above, that V is proportional to A. This means that if one has
more money to begin with (for investing), then he should be willing to pay more for the
same information than an investor with less money.
4
Decision making with partial information
Let’s look at the general case where partial information about X is available (not all of it,
nor not none of it).
Denote S to be a random variable which holds some information about X. We call S a
Signal. A signal can be for example the earnings of a company last month, giving us a hint
on what will be the earnings of the same company next month. We will take interest in the
distribution of the probability function p(X|S).
Now our action a is a function: a = a(s), and now we are looking to maximize the following
expression:
arg max EX|S [w(a, x)] or arg max EX,S [w(a(s), x)]
a∈A
a∈A
Lets assume that there is a set of signals S(X) from which we can choose a single signal. A
signal is represented by the distribution of p(S|X). Using Bayes’ theorem, we can calculate
the needed distribution P (X|S):
p(S|x) p(x)
0
0
x0 ∈X P (S|x ) P (x )
p(x|S) = P
Obviously, the terminal wealth will now depend on S, so how do we choose the optimal
signal from S?
·
¸
£
¤
S = arg max max EX|S [w(a(s), x)]
S∈S(X)
5
a(S)
Signal comparison
Definition 2 Let S, S 0 ∈ S(X) be two signals. We call S’ a garbling of S if one of these
apply:
• ∀X, S, S 0 : p(X|S, S 0 ) = p(X|S)
• ∀X, S, S 0 : p(S, S 0 |X) = p(S|X) · p(S 0 |S)
• ∀X, S, S 0 : p(S 0 |S, X) = p(S 0 |S)
Claim 1 If we define the following matrices:
9-4
• Li,j = p(S = j|X = i)
• L0i,j = p(S 0 = j|X = i)
• Gi,j = p(S 0 = j|S = i)
Then L0 = LG if S 0 is a garbling of S.
Note that L, L0 define the distributions of p(S|X) and p(S 0 |X) accordingly. We will denote
L to be the matrix of the signal S.
Definition 3 A markov matrix is a square matrix whose rows consists of non-negative
real numbers, with each row summing to 1.
Definition 4 Let L, L0 be the matrices of the signals S, S 0 accordingly. S 0 is called a quasigarbling of S if there exists a markov matrix M such that L0 = LM .
Definition 5 A signal S is considered at least as informative as S 0 , if for every action
space A and every payoff function w(a, x), the optimal decision function based on S is at
least as good as the optimal decision based on S 0 .
Theorem 2 (Blackwell theorem) S is at least as informative as S 0 iff S 0 is a quasigarbling of S
Observation 4 If S is at least as informative as S 0 and S 0 is at least as informative as S,
then the information that can be extracted from both of them is equivalent.
Notice that informativeness is a partial order (because two signals can fulfill that none of
them is more informative than the other). We would like to find a total order on signals.
Definition 6 The rate of transmission (mutual information) of S towards X is defined to
be:
R(X|S) = H(X) − ES [H(X|S)]
Meaning, how much ”extra” information does S give us on X, according to Shannon’s
entropy.
Claim 2 Let S and S 0 be signals. Assume S is at least as informative as S 0 . Then
R(X|S) ≥ R(X|S 0 ).
9-5