Download mixed strategy: p ^ i - Computer and Information Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Paul Milgrom wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

John Forbes Nash Jr. wikipedia , lookup

Minimax wikipedia , lookup

Strategic management wikipedia , lookup

Mechanism design wikipedia , lookup

Porter's generic strategies wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Nash equilibrium wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Analyzing Complex Strategic Interactions in
Multi-Agent Systems
Walsh, Das, Tesauro, Kephart
in Proceedings of the Workshop on Game Theoretic and
Decision Theoretic Agents
(AAAI 2002)
Presented by Deniz Sarioz
in Simon Parsons’ e-commerce seminar,
CUNY Grad Center, April 19th 2005.
Premise
• Understanding interactions among various strategies can
be valuable both to designers of markets (wishing to
ensure economic efficiency and stability) and to designers
of individual agents (wishing to ‘maximize’ profits).
• By “demystifying” strategic interactions among agents, can
improve our ability to predict (and design) the overall
behavior of multi-agent systems.
• Question of which strategy is “best” is often not the most
appropriate, a mix of strategies can be an equilibrium.
• The tournament approach has the shortcoming of being
one trajectory through an infinite space of possible
interactions.
• Authors present a more principled and complete method
for analyzing interactions among strategies.
Modeling Approach
• Start with a game that may include complex, repeated
interactions between A agents.
• The rules specify particular actions that agents may take as
a function of the game state (e.g., “bid b at time t.”)
• Each of the agents has a choice among S exogenously
specified, heuristic strategies.
• Strategies are “heuristic” in that they are generally not the
solution of Bayes / Nash equilibrium analysis
• Compute a heuristic-payoff table that specifies the
expected payoff to each agent as a function of the
strategies played by all agents.
• Agent strategies are selected independently from agent
type (an assumption to help with tractability).
Modeling Approach
• The heuristic-payoff table is an abstract representation of
the fundamental game that reduces a potentially very
complex game to a one-shot game in “normal form”.
• Treat the choice of heuristic strategies rather than basic
actions as the level of decision making for strategic
analysis.
• Standard payoff table for a normal-form game requires SA
entries, which can be huge even for moderate S, A.
– e.g., 3 agents X Y Z and 4 strategies S T U V: need cells in the
table for (XS, YS, ZS), (XS, YS, ZT), (XS, YS, ZU), ...
• Due to same distribution of types, we can express the
payoff of each strategy as a function of the number of
agents playing each strategy ((XS,YS,ZT)  (XT,YS,ZS)).
Modeling Approach
• What does this combinatorial reduction buy us?
• Number of entries in the table is the number of unique
ways A agents can be partitioned into S strategies.
A+S-1CA = (A+S-1)(A+S-2) ... (A+1) / (S-1)!
which is bounded below by and is approximately (for A>>S):
AS-1/(S-1)!
• This is fine for situations where even a lot of agents get to
pick from / switch among a few strategies.
• For A=20 and S=3, the symmetric payoff table contains
231 entries rather than 320 = (3.5) 109 in the asymmetric.
• For sufficiently simple games, payoffs may be computed
analytically. Complexity usually demands simulations.
Modeling Approach
• With a payoff table table is computed, a variety of
techniques can be used. 3 such techniques in this paper:
• A static analysis, entails computing Nash equilibria of the
payoff table.
• Model the dynamics of agents that switch to strategies that
appear more successful.
• Suggest techniques for understanding strategies at a deeper
level. Specifically: perturbation analysis
Equilibrium Computation
• At the start of the game, each of the A agents chooses one
of the S strategies
• The payoff to agent i is a real function u of the strategies
played by all agents.
– Payoff is the expected reward
• We assume symmetric strategy sets and payoffs, so payoff
to an agent can be represented as the payoff to each
strategy as a function of the number of agents playing each
strategy.
Equilibrium Computation
• Agent i may choose its strategies randomly according to a
mixed strategy:
p^i = (p^i,1 , p^i,2 , ... , p^i,S )
where p^i,j is the probability of agent i playing strategy j.
• The (S-dim) vector of all agents’ mixed strategies is p^
and the ((AS-S)-dim) vector of mixed strategies for all
agents except i is p^-i
• p^i = e j denotes the special case of p^i,j = 1.
• u(e j , p^-i) denotes payoff for agent i for playing pure
strategy j while others play a mixed strategy.
• For mixed p^i we have u(p^i, p^-i) = Sj=1 u(e j , p^-i) p^i,j
Equilibrium Computation
• In game theoretic analysis, usually assume everyone plays
mixed Nash equilibrium strategies. i.e., no one can receive a
higher payoff by unilaterally deviating to another strategy.
Formally, p^* is a Nash equilibrium iff agent i:
u(p^i , p^*-i )  u (p^*i , p^*-i )
• Rest of the paper: specifically focus on symmetric mixed
strategy equilibria--that is, agents i, k: p^*i = p^*k = p*
• Denote an arbitrary (not necessarily equilibrium) symmetric
mixed strategy by p and the probability that a given agent
plays pure strategy j by pj
• Symmetry assumption simplifies the analysis from
A S-tuples of probabilities to only one such S-tuple.
– Justification: a symmetric game (to be defined later) always has
at least one (usually non-unique) symmetric Nash equilibrium.
Equilibrium Computation
• There are equivalent formulations to the concept of Nash
equilibrium. But mostly do not exploit symmetry, which
imposes a severe restriction on the problem size to be solved.
• Formulate Nash Equilibrium as the minimum of a function on
a polytope.
• Considering only symmetric equilibria, the problem is to (find
the p that) minimize(s): [p* is Nash Eq iff global min of v]
v(p) = Sj=1 (max [u(e j , p) - u(p, p) , 0 ] )2
restatement of v(p) in pseudo-C by way of “derivation”:
Sj=1 [ (u(e j , p) > u(p, p)) ? (u(e j , p) - u(p, p))2 : 0 ]
• The polytope is just the (S-1)-simplex in S dimensions defined
by the constraints Sj=1 pj = 1 and (j) pj  0
• e.g., if we have 3 strategies, look for equilibria on the interior
of a triangle, which is a bounded subset of 2 dimensional space.
Equilibrium Computation
• Used a software package called amoeba, a non-linear
optimizer, to find the minimum of the stated function on the Ssimplex.
• The function is evaluated at each vertex of the simplex and the
polytope attempts to move down the estimated gradient by a
series of geometric transformations that strive to replace the
worst-performing vertex.
• Authors repeatedly ran amoeba for restarting at random points
on the S-simplex and stopping when found 30 previously
discovered equilibria in a row.
• For A=20, S=3 took 10 minutes on a 450MHz machine.
Dynamic Analysis
• Nash equilibria provide a theoretically satisfying view of the
ideal static properties of a multi-agent system.
• Dynamic properties may be of greater or equal concern.
• In actual systems, may be unreasonable to assume that agents
all have correct and common knowledge necessary to compute
equilibria.
• Borrow a well-developed model from evolutionary game
theory (Weibull 1995) to analyze strategy choice dynamics.
• Posit a very large population of N agents, and A agents (A<<N)
are randomly chosen to play at each ‘tick’. Each agent plays
one of the S pure strategies and the fraction of agents playing
strategy j is pj
• “for sufficiently large N, pj may be treated as cts variable”
Dynamic Analysis
• Use the replicator dynamics formalism to model the evolution
of p with time as:
.
p j = [ u(e j , p) - u(p, p) ] pj
where u(p, p) is the population average payoff and
where u(e j , p) is the average payoff to agents currently using
pure strategy j,
.
and (I think) p j is by how much the jth component of p increases
at every tick.
• This equation models the tendency of strategies with greater
than average payoff to attract followers and those with less than
average payoff to suffer defections.
Dynamic Analysis
• Prefer that a dynamic model assume minimal informational
requirements for agents beyond their own actions and payoffs.
• The replicator dynamics eq implies that agents know u(p, p)
and that is a very implausible assumption.
• However, can obtain the same population dynamics with a
“replication by imitation” model (Weibull 1995):
– An agent switches to the strategy of a randomly chosen opponent
who appears to be receiving a higher payoff.
– Interpret p at any given time as representing a symmetric mixed
strategy for all N players in the game. Then the fixed points of
.
equation 2 (where j p j = 0 ) correspond to Nash equilibria.
– When strategy trajectories governed by equation 2 converge to an
equilibrium, the equilibrium is an attractor.
– However, trajectories do not necessarily converge.
Dynamic Analysis
• Two ways of thinking about this:
– When multiple Nash equilibria exist, those that are attractors
are the only plausible equilibria within the evolutionary model.
Those with larger basins of attraction are more likely, assuming
that every initial population state is equally likely.
– We can use the basins of attraction to understand which initial
population mixes will lead to which equilibrium.
•
.
Use the heuristic payoff table and eq for p to generate a large
number of strategy trajectories, starting from a broad
distribution of initial strategy vectors p
• For a choice of 3 strategies, the resulting flows can be plotted
in a 2-D unit simplex and have an immediate visual
interpretation.
Perturbation of Payoffs
• We are only considering very few strategies, whereas in
fact the strategy space is infinite and strategies can in
practice be slightly modified.
• Perturb the payoff table in some meaningful ways, in
order to perform some directed study of plausible
effects of abstract changes in strategy behavior.
• Key basic idea: the n+ agents of strategy + steal some
part of the payoff of n- agents of strategy - in a way
that preserves the total:
q(+) +=  min(n+ , n- ) q(+) / n+
q(-) -=  min(n+ , n- ) q(-) / n• Alternative changes: pick on just one strategy, uniform
improvements to all strategies, random, etc.
Applications
• Applied “the methodology” to:
– Automated Dynamic Pricing (ADP)
– Continuous Double Auction (CDA)
• Chosen because computationally intractable to compute
equilibria in these.
• Body of literature exists which includes interesting
heuristic strategies.
• Simulators are available for computing the heuristic
payoff tables.
Automated Dynamic Pricing
(description)
• Recent emergence of “shopbots”: e.g., buy.com
monitors and undercuts competitors’ prices
• In this experiment, sellers choose one of three heuristic
dynamic pricing strategies
– GT: “game theory”: (Greenwald & Kephart 1999) plays a
mixed-strategy Nash equilibrium computed for the underlying
game assuming all pricing and purchasing decisions are made
simultaneously
– DF: “derivative follower”: simple hill-climbing adaptation:
solely based on observed profitability, ignoring assumptions
about other buyers and sellers.
– NIR: “No Internal Regret”: (Greenwald & Kephart 1999)
adapted from Foster Vohra (1997).
Automated Dynamic Pricing
(analysis)
• Looked at two scenarios for these 3 strategies, with 5
agents and with 20 agents.
• Only A is a pure-strategy Nash equilibrium
• When number of agents increased to 20, number of
equilibria drops to one!
Automated Dynamic Pricing
(analysis)
Automated Dynamic Pricing
(analysis)
• Since DF was never seen as an attractor, tried
applied a perturbation analysis to compare the
other two using the method specified
• Result: NIR would start becoming a strong
strategy with a 6.75% improvement, and
nearly dominant with 10% improvement.