Download Satisfaction Equilibrium Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial intelligence in video games wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Mechanism design wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Nash equilibrium wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Satisfaction
Equilibrium
Stéphane Ross
Problem

In real life multiagent systems :
 Agents
generally do not know the preferences
(rewards) of their opponents
 Agents may not observe the actions of their
opponents


In this context, most game theoretic solution
concepts are hardly applicable
We may try to define equilibrium concepts that :


do not require complete information
are achievable through learning, over repeated play
Canadian AI 2006
2 / 21
Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
3 / 21
Presentation Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
4 / 21
Game model






: Number of agents
: Joint action space
: Set of possible outcomes
, the outcome function.
, agent i’s reward function.
Agent only knows , and .
 After each turn, every agent observes an
outcome
.

Canadian AI 2006
5 / 21
Game model

Observations:
 The
agents do not know the game matrix
A
B
a,?
c,?
b,?
d,?
a,b,c,d
 They
are unable to compute best responses
and Nash Equilibrium.
 They can only reason on their history of
actions and rewards.
Canadian AI 2006
6 / 21
Presentation Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
7 / 21
Satisfaction Equilibrium

Since the agents can only reason on their history
of payoff, we may adopt a satisfaction-based
reasoning:
 If
an agent is satisfied by its current reward, it should
keep playing the same strategy
 An unsatisfied agent may decide to change its
strategy according to some exploration function

An equilibrium will arise when all agents are
satisfied.
Canadian AI 2006
8 / 21
Satisfaction Equilibrium

Formally :

is the satisfaction function of agent :




if
if
(agent i is satisfied)
(agent i is not satisfied)
is the satisfaction threshold of agent
A joint strategy
is a satisfaction equilibrium if :


Canadian AI 2006
9 / 21
Example


Prisoner’s dilemma
C
D
C -1, -1 -10, 0
Dominant strategy : D
D 0, -10 -8,-8
Pareto-Optimal : (C,C), (D,C), (C,D)
Nash Equilibrium : (D,D)
Possible satisfaction matrix :
C
D
C
1, 1
D
0, 1
1, 0
0, 0
C
D
Canadian AI 2006
C
1, 1
D
0, 1
1, 0
1, 1
10 / 21
Satisfaction Equilibrium

However, even if a satisfaction equilibrium
exists, it may be unreachable :
A
A 1,1
B 1,0
B
C
0,1
0,1
1,0
0,1
C 1,0
0,1
1,0
Canadian AI 2006
11 / 21
Presentation Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
12 / 21
Satisfaction Equilibrium Learning

If the satisfaction thresholds are fixed, we only
need to apply the satisfaction-based reasoning:
 Choose
a strategy randomly
 If satisfied, keep playing the same strategy
 Else choose a new strategy randomly

We can also use other exploration functions
which favour actions that have not been
explored often
 Ex:
Canadian AI 2006
13 / 21
Satisfaction Equilibrium Learning

We use a simple update rule:
 When
the agent is satisfied, we increment its
satisfaction threshold by some variable
 If the agent is unsatisfied, we decrement its
satisfaction threshold of
 is multiplied by a factor
each turn such that
it converges to 0

We also use a limited history of our previous
satisfaction states and thresholds for each action
to bound the value of the satisfaction threshold
Canadian AI 2006
14 / 21
Presentation Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
15 / 21
Results

Fixed satisfaction thresholds



In simple games, we were always able to reach a satisfaction
equilibrium.
Using a biased exploration improves the speed of convergence of
the algorithm.
Learning the satisfaction thresholds



We are generally able to learn the optimal satisfaction equilibrium
in simple games.
Using a biased exploration improves the convergence
percentage of the algorithm.
The factor and history size affects the convergence of the
algorithm and need to be adjusted to get optimal results.
Canadian AI 2006
16 / 21
Results – Prisoner’s dilemma
Canadian AI 2006
17 / 21
Presentation Plan
Game model
 Satisfaction Equilibrium
 Satisfaction Equilibrium Learning
 Results
 Conclusion
 Questions

Canadian AI 2006
18 / 21
Conclusion



It is possible to learn stable outcomes without
observing anything but our own rewards
Satisfaction equilibria can be defined on any
Pareto-Optimal solution.
 However, satisfaction equilibria are not always
reachable
The proposed learning algorithms achieves
good performance in simple games
 However, they require game-specific
adjustments for optimal performance
Canadian AI 2006
19 / 21
Conclusion

For more information, you can consult my
publications at:
 http://www.damas.ift.ulaval.ca/~ross
Thank You!
Canadian AI 2006
20 / 21
Questions
Canadian AI 2006
21 / 21