Download Satisfaction Equilibrium Learning

Satisfaction Equilibrium Stéphane Ross Problem  In real life multiagent systems :  Agents generally do not know the preferences (rewards) of their opponents  Agents may not observe the actions of their opponents   In this context, most game theoretic solution concepts are hardly applicable We may try to define equilibrium concepts that :   do not require complete information are achievable through learning, over repeated play Canadian AI 2006 2 / 21 Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 3 / 21 Presentation Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 4 / 21 Game model       : Number of agents : Joint action space : Set of possible outcomes , the outcome function. , agent i’s reward function. Agent only knows , and .  After each turn, every agent observes an outcome .  Canadian AI 2006 5 / 21 Game model  Observations:  The agents do not know the game matrix A B a,? c,? b,? d,? a,b,c,d  They are unable to compute best responses and Nash Equilibrium.  They can only reason on their history of actions and rewards. Canadian AI 2006 6 / 21 Presentation Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 7 / 21 Satisfaction Equilibrium  Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning:  If an agent is satisfied by its current reward, it should keep playing the same strategy  An unsatisfied agent may decide to change its strategy according to some exploration function  An equilibrium will arise when all agents are satisfied. Canadian AI 2006 8 / 21 Satisfaction Equilibrium  Formally :  is the satisfaction function of agent :     if if (agent i is satisfied) (agent i is not satisfied) is the satisfaction threshold of agent A joint strategy is a satisfaction equilibrium if :   Canadian AI 2006 9 / 21 Example   Prisoner’s dilemma C D C -1, -1 -10, 0 Dominant strategy : D D 0, -10 -8,-8 Pareto-Optimal : (C,C), (D,C), (C,D) Nash Equilibrium : (D,D) Possible satisfaction matrix : C D C 1, 1 D 0, 1 1, 0 0, 0 C D Canadian AI 2006 C 1, 1 D 0, 1 1, 0 1, 1 10 / 21 Satisfaction Equilibrium  However, even if a satisfaction equilibrium exists, it may be unreachable : A A 1,1 B 1,0 B C 0,1 0,1 1,0 0,1 C 1,0 0,1 1,0 Canadian AI 2006 11 / 21 Presentation Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 12 / 21 Satisfaction Equilibrium Learning  If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning:  Choose a strategy randomly  If satisfied, keep playing the same strategy  Else choose a new strategy randomly  We can also use other exploration functions which favour actions that have not been explored often  Ex: Canadian AI 2006 13 / 21 Satisfaction Equilibrium Learning  We use a simple update rule:  When the agent is satisfied, we increment its satisfaction threshold by some variable  If the agent is unsatisfied, we decrement its satisfaction threshold of  is multiplied by a factor each turn such that it converges to 0  We also use a limited history of our previous satisfaction states and thresholds for each action to bound the value of the satisfaction threshold Canadian AI 2006 14 / 21 Presentation Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 15 / 21 Results  Fixed satisfaction thresholds    In simple games, we were always able to reach a satisfaction equilibrium. Using a biased exploration improves the speed of convergence of the algorithm. Learning the satisfaction thresholds    We are generally able to learn the optimal satisfaction equilibrium in simple games. Using a biased exploration improves the convergence percentage of the algorithm. The factor and history size affects the convergence of the algorithm and need to be adjusted to get optimal results. Canadian AI 2006 16 / 21 Results – Prisoner’s dilemma Canadian AI 2006 17 / 21 Presentation Plan Game model  Satisfaction Equilibrium  Satisfaction Equilibrium Learning  Results  Conclusion  Questions  Canadian AI 2006 18 / 21 Conclusion    It is possible to learn stable outcomes without observing anything but our own rewards Satisfaction equilibria can be defined on any Pareto-Optimal solution.  However, satisfaction equilibria are not always reachable The proposed learning algorithms achieves good performance in simple games  However, they require game-specific adjustments for optimal performance Canadian AI 2006 19 / 21 Conclusion  For more information, you can consult my publications at:  http://www.damas.ift.ulaval.ca/~ross Thank You! Canadian AI 2006 20 / 21 Questions Canadian AI 2006 21 / 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Satisfaction Equilibrium Learning