Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Satisfaction Equilibrium Stéphane Ross Problem In real life multiagent systems : Agents generally do not know the preferences (rewards) of their opponents Agents may not observe the actions of their opponents In this context, most game theoretic solution concepts are hardly applicable We may try to define equilibrium concepts that : do not require complete information are achievable through learning, over repeated play Canadian AI 2006 2 / 21 Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 3 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 4 / 21 Game model : Number of agents : Joint action space : Set of possible outcomes , the outcome function. , agent i’s reward function. Agent only knows , and . After each turn, every agent observes an outcome . Canadian AI 2006 5 / 21 Game model Observations: The agents do not know the game matrix A B a,? c,? b,? d,? a,b,c,d They are unable to compute best responses and Nash Equilibrium. They can only reason on their history of actions and rewards. Canadian AI 2006 6 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 7 / 21 Satisfaction Equilibrium Since the agents can only reason on their history of payoff, we may adopt a satisfaction-based reasoning: If an agent is satisfied by its current reward, it should keep playing the same strategy An unsatisfied agent may decide to change its strategy according to some exploration function An equilibrium will arise when all agents are satisfied. Canadian AI 2006 8 / 21 Satisfaction Equilibrium Formally : is the satisfaction function of agent : if if (agent i is satisfied) (agent i is not satisfied) is the satisfaction threshold of agent A joint strategy is a satisfaction equilibrium if : Canadian AI 2006 9 / 21 Example Prisoner’s dilemma C D C -1, -1 -10, 0 Dominant strategy : D D 0, -10 -8,-8 Pareto-Optimal : (C,C), (D,C), (C,D) Nash Equilibrium : (D,D) Possible satisfaction matrix : C D C 1, 1 D 0, 1 1, 0 0, 0 C D Canadian AI 2006 C 1, 1 D 0, 1 1, 0 1, 1 10 / 21 Satisfaction Equilibrium However, even if a satisfaction equilibrium exists, it may be unreachable : A A 1,1 B 1,0 B C 0,1 0,1 1,0 0,1 C 1,0 0,1 1,0 Canadian AI 2006 11 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 12 / 21 Satisfaction Equilibrium Learning If the satisfaction thresholds are fixed, we only need to apply the satisfaction-based reasoning: Choose a strategy randomly If satisfied, keep playing the same strategy Else choose a new strategy randomly We can also use other exploration functions which favour actions that have not been explored often Ex: Canadian AI 2006 13 / 21 Satisfaction Equilibrium Learning We use a simple update rule: When the agent is satisfied, we increment its satisfaction threshold by some variable If the agent is unsatisfied, we decrement its satisfaction threshold of is multiplied by a factor each turn such that it converges to 0 We also use a limited history of our previous satisfaction states and thresholds for each action to bound the value of the satisfaction threshold Canadian AI 2006 14 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 15 / 21 Results Fixed satisfaction thresholds In simple games, we were always able to reach a satisfaction equilibrium. Using a biased exploration improves the speed of convergence of the algorithm. Learning the satisfaction thresholds We are generally able to learn the optimal satisfaction equilibrium in simple games. Using a biased exploration improves the convergence percentage of the algorithm. The factor and history size affects the convergence of the algorithm and need to be adjusted to get optimal results. Canadian AI 2006 16 / 21 Results – Prisoner’s dilemma Canadian AI 2006 17 / 21 Presentation Plan Game model Satisfaction Equilibrium Satisfaction Equilibrium Learning Results Conclusion Questions Canadian AI 2006 18 / 21 Conclusion It is possible to learn stable outcomes without observing anything but our own rewards Satisfaction equilibria can be defined on any Pareto-Optimal solution. However, satisfaction equilibria are not always reachable The proposed learning algorithms achieves good performance in simple games However, they require game-specific adjustments for optimal performance Canadian AI 2006 19 / 21 Conclusion For more information, you can consult my publications at: http://www.damas.ift.ulaval.ca/~ross Thank You! Canadian AI 2006 20 / 21 Questions Canadian AI 2006 21 / 21