Download Notes for lecture 11

CS 6840 Algorithmic Game Theory (2 pages) Spring 2012 Lecture 11 Scribe Notes Instructor: Éva Tardos Norris Xu 1 Lecture 11 Wednesday 15 February 2012 No-Regret Learning 1.1 No-Regret Learning Suppose we have a game with cost, and players 1, . . . , n. For a strategy vector s = (s1 , . . . , sn ), we have ci (s) is the cost for player i if strategy vector s is played. We play this game on days 1, . .P .,T, P with st = (st1 , . . . , stn ) being the P strategy vector used on T day t. The overall cost is therefore t=1 i ci (st ) and player i's cost is Tt=1 ci (st ). We can model Best Response: each day, one person best responds, and the rest do the same thing as before. But best response is unrealistic (e.g. rock-paper-scissors). In reality, players learn based on others' strategies. Denition (No-Regret). A sequence of strategy vectors s1 , . . . , sT is T X ci (st ) ≤ min t=1 x T X no-regret for player i if: ci (x, st−i ) t=1 | {z } cost of strategy x Denition (Vanishing Regret). A sequence of strategy vectors s1 , . . . , sT has player i if, assuming that 0 ≤ ci (s) ≤ 1 for any strategy vector s: lim sup T →∞ vanishing regret for T T 1X 1X ci (st ) − min ci (x, st−i ) ≤ 0 x T T t=1 t=1 Theorem. If a cost game is (λ, µ)-smooth1 and all players have no regret on a sequence s1 , . . . , sT of plays, then: T X X X λ ci (st ) ≤ T min ci (s) s 1−µ t=1 i i | {z } same bound as Price of Anarchy 1 Denition ((λ, µ)-smooth). A cost game is (λ, µ)-smooth if for any strategy X X X ci (s∗i , s−i ) ≤ λ ci (s∗ ) + µ ci (s) i i i vectors s, s∗ : CS 6840 Lecture 11 Scribe Notes (page 2 of 2) Proof. Let s∗ be the min cost vector. Then for any player i, s∗i is no-regret. Then: T X t=1 T XX t=1 ci (st ) ≤ min T X x ci (st ) ≤ i ≤ t=1 T XX t=1 T X ci (x, st−i ) ≤ t=1 Lemma. 1.2 i ci (st ) ≤ ci (s∗i , st−i ) t=1 ci (s∗i , st−i ) i ! λ X t=1 T X X T X ci (s∗ ) + µ i X i ci (st ) = λT X i ci (s∗ ) + µ T X X t=1 ci (st ) i X λ T ci (s∗ ) 1−µ i The sequence s, s, . . . , s is no-regret for all players if and only if s is a Nash equilibrium. Randomized Strategy For all players i and strategies x, let the probability distribution pi (x) be the probability of i playing x. Assuming that the players' choices Q are independent, the probability of playing the strategy vector s = (s , . . . , s ) is then p(s) = 1 n i pi (si ). The expected cost Es [ci (s)] for player i is P P Q p(s)c (s) = p (s )c (s) . i s s=(s1 ,...,sn ) j j j i A set of probability distributions pi form a Nash equilibrium if for any player i and strategy x, Es [ci (s)] ≤ Es [ci (x, s−i )]; equivalently, for any player i, Es [ci (s)] ≤ minx Es [ci (x, s−i )].

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes for lecture 11