* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download mixed strategy: p ^ i - Computer and Information Science
Paul Milgrom wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
John Forbes Nash Jr. wikipedia , lookup
Strategic management wikipedia , lookup
Mechanism design wikipedia , lookup
Porter's generic strategies wikipedia , lookup
The Evolution of Cooperation wikipedia , lookup
Nash equilibrium wikipedia , lookup
Prisoner's dilemma wikipedia , lookup
Analyzing Complex Strategic Interactions in Multi-Agent Systems Walsh, Das, Tesauro, Kephart in Proceedings of the Workshop on Game Theoretic and Decision Theoretic Agents (AAAI 2002) Presented by Deniz Sarioz in Simon Parsons’ e-commerce seminar, CUNY Grad Center, April 19th 2005. Premise • Understanding interactions among various strategies can be valuable both to designers of markets (wishing to ensure economic efficiency and stability) and to designers of individual agents (wishing to ‘maximize’ profits). • By “demystifying” strategic interactions among agents, can improve our ability to predict (and design) the overall behavior of multi-agent systems. • Question of which strategy is “best” is often not the most appropriate, a mix of strategies can be an equilibrium. • The tournament approach has the shortcoming of being one trajectory through an infinite space of possible interactions. • Authors present a more principled and complete method for analyzing interactions among strategies. Modeling Approach • Start with a game that may include complex, repeated interactions between A agents. • The rules specify particular actions that agents may take as a function of the game state (e.g., “bid b at time t.”) • Each of the agents has a choice among S exogenously specified, heuristic strategies. • Strategies are “heuristic” in that they are generally not the solution of Bayes / Nash equilibrium analysis • Compute a heuristic-payoff table that specifies the expected payoff to each agent as a function of the strategies played by all agents. • Agent strategies are selected independently from agent type (an assumption to help with tractability). Modeling Approach • The heuristic-payoff table is an abstract representation of the fundamental game that reduces a potentially very complex game to a one-shot game in “normal form”. • Treat the choice of heuristic strategies rather than basic actions as the level of decision making for strategic analysis. • Standard payoff table for a normal-form game requires SA entries, which can be huge even for moderate S, A. – e.g., 3 agents X Y Z and 4 strategies S T U V: need cells in the table for (XS, YS, ZS), (XS, YS, ZT), (XS, YS, ZU), ... • Due to same distribution of types, we can express the payoff of each strategy as a function of the number of agents playing each strategy ((XS,YS,ZT) (XT,YS,ZS)). Modeling Approach • What does this combinatorial reduction buy us? • Number of entries in the table is the number of unique ways A agents can be partitioned into S strategies. A+S-1CA = (A+S-1)(A+S-2) ... (A+1) / (S-1)! which is bounded below by and is approximately (for A>>S): AS-1/(S-1)! • This is fine for situations where even a lot of agents get to pick from / switch among a few strategies. • For A=20 and S=3, the symmetric payoff table contains 231 entries rather than 320 = (3.5) 109 in the asymmetric. • For sufficiently simple games, payoffs may be computed analytically. Complexity usually demands simulations. Modeling Approach • With a payoff table table is computed, a variety of techniques can be used. 3 such techniques in this paper: • A static analysis, entails computing Nash equilibria of the payoff table. • Model the dynamics of agents that switch to strategies that appear more successful. • Suggest techniques for understanding strategies at a deeper level. Specifically: perturbation analysis Equilibrium Computation • At the start of the game, each of the A agents chooses one of the S strategies • The payoff to agent i is a real function u of the strategies played by all agents. – Payoff is the expected reward • We assume symmetric strategy sets and payoffs, so payoff to an agent can be represented as the payoff to each strategy as a function of the number of agents playing each strategy. Equilibrium Computation • Agent i may choose its strategies randomly according to a mixed strategy: p^i = (p^i,1 , p^i,2 , ... , p^i,S ) where p^i,j is the probability of agent i playing strategy j. • The (S-dim) vector of all agents’ mixed strategies is p^ and the ((AS-S)-dim) vector of mixed strategies for all agents except i is p^-i • p^i = e j denotes the special case of p^i,j = 1. • u(e j , p^-i) denotes payoff for agent i for playing pure strategy j while others play a mixed strategy. • For mixed p^i we have u(p^i, p^-i) = Sj=1 u(e j , p^-i) p^i,j Equilibrium Computation • In game theoretic analysis, usually assume everyone plays mixed Nash equilibrium strategies. i.e., no one can receive a higher payoff by unilaterally deviating to another strategy. Formally, p^* is a Nash equilibrium iff agent i: u(p^i , p^*-i ) u (p^*i , p^*-i ) • Rest of the paper: specifically focus on symmetric mixed strategy equilibria--that is, agents i, k: p^*i = p^*k = p* • Denote an arbitrary (not necessarily equilibrium) symmetric mixed strategy by p and the probability that a given agent plays pure strategy j by pj • Symmetry assumption simplifies the analysis from A S-tuples of probabilities to only one such S-tuple. – Justification: a symmetric game (to be defined later) always has at least one (usually non-unique) symmetric Nash equilibrium. Equilibrium Computation • There are equivalent formulations to the concept of Nash equilibrium. But mostly do not exploit symmetry, which imposes a severe restriction on the problem size to be solved. • Formulate Nash Equilibrium as the minimum of a function on a polytope. • Considering only symmetric equilibria, the problem is to (find the p that) minimize(s): [p* is Nash Eq iff global min of v] v(p) = Sj=1 (max [u(e j , p) - u(p, p) , 0 ] )2 restatement of v(p) in pseudo-C by way of “derivation”: Sj=1 [ (u(e j , p) > u(p, p)) ? (u(e j , p) - u(p, p))2 : 0 ] • The polytope is just the (S-1)-simplex in S dimensions defined by the constraints Sj=1 pj = 1 and (j) pj 0 • e.g., if we have 3 strategies, look for equilibria on the interior of a triangle, which is a bounded subset of 2 dimensional space. Equilibrium Computation • Used a software package called amoeba, a non-linear optimizer, to find the minimum of the stated function on the Ssimplex. • The function is evaluated at each vertex of the simplex and the polytope attempts to move down the estimated gradient by a series of geometric transformations that strive to replace the worst-performing vertex. • Authors repeatedly ran amoeba for restarting at random points on the S-simplex and stopping when found 30 previously discovered equilibria in a row. • For A=20, S=3 took 10 minutes on a 450MHz machine. Dynamic Analysis • Nash equilibria provide a theoretically satisfying view of the ideal static properties of a multi-agent system. • Dynamic properties may be of greater or equal concern. • In actual systems, may be unreasonable to assume that agents all have correct and common knowledge necessary to compute equilibria. • Borrow a well-developed model from evolutionary game theory (Weibull 1995) to analyze strategy choice dynamics. • Posit a very large population of N agents, and A agents (A<<N) are randomly chosen to play at each ‘tick’. Each agent plays one of the S pure strategies and the fraction of agents playing strategy j is pj • “for sufficiently large N, pj may be treated as cts variable” Dynamic Analysis • Use the replicator dynamics formalism to model the evolution of p with time as: . p j = [ u(e j , p) - u(p, p) ] pj where u(p, p) is the population average payoff and where u(e j , p) is the average payoff to agents currently using pure strategy j, . and (I think) p j is by how much the jth component of p increases at every tick. • This equation models the tendency of strategies with greater than average payoff to attract followers and those with less than average payoff to suffer defections. Dynamic Analysis • Prefer that a dynamic model assume minimal informational requirements for agents beyond their own actions and payoffs. • The replicator dynamics eq implies that agents know u(p, p) and that is a very implausible assumption. • However, can obtain the same population dynamics with a “replication by imitation” model (Weibull 1995): – An agent switches to the strategy of a randomly chosen opponent who appears to be receiving a higher payoff. – Interpret p at any given time as representing a symmetric mixed strategy for all N players in the game. Then the fixed points of . equation 2 (where j p j = 0 ) correspond to Nash equilibria. – When strategy trajectories governed by equation 2 converge to an equilibrium, the equilibrium is an attractor. – However, trajectories do not necessarily converge. Dynamic Analysis • Two ways of thinking about this: – When multiple Nash equilibria exist, those that are attractors are the only plausible equilibria within the evolutionary model. Those with larger basins of attraction are more likely, assuming that every initial population state is equally likely. – We can use the basins of attraction to understand which initial population mixes will lead to which equilibrium. • . Use the heuristic payoff table and eq for p to generate a large number of strategy trajectories, starting from a broad distribution of initial strategy vectors p • For a choice of 3 strategies, the resulting flows can be plotted in a 2-D unit simplex and have an immediate visual interpretation. Perturbation of Payoffs • We are only considering very few strategies, whereas in fact the strategy space is infinite and strategies can in practice be slightly modified. • Perturb the payoff table in some meaningful ways, in order to perform some directed study of plausible effects of abstract changes in strategy behavior. • Key basic idea: the n+ agents of strategy + steal some part of the payoff of n- agents of strategy - in a way that preserves the total: q(+) += min(n+ , n- ) q(+) / n+ q(-) -= min(n+ , n- ) q(-) / n• Alternative changes: pick on just one strategy, uniform improvements to all strategies, random, etc. Applications • Applied “the methodology” to: – Automated Dynamic Pricing (ADP) – Continuous Double Auction (CDA) • Chosen because computationally intractable to compute equilibria in these. • Body of literature exists which includes interesting heuristic strategies. • Simulators are available for computing the heuristic payoff tables. Automated Dynamic Pricing (description) • Recent emergence of “shopbots”: e.g., buy.com monitors and undercuts competitors’ prices • In this experiment, sellers choose one of three heuristic dynamic pricing strategies – GT: “game theory”: (Greenwald & Kephart 1999) plays a mixed-strategy Nash equilibrium computed for the underlying game assuming all pricing and purchasing decisions are made simultaneously – DF: “derivative follower”: simple hill-climbing adaptation: solely based on observed profitability, ignoring assumptions about other buyers and sellers. – NIR: “No Internal Regret”: (Greenwald & Kephart 1999) adapted from Foster Vohra (1997). Automated Dynamic Pricing (analysis) • Looked at two scenarios for these 3 strategies, with 5 agents and with 20 agents. • Only A is a pure-strategy Nash equilibrium • When number of agents increased to 20, number of equilibria drops to one! Automated Dynamic Pricing (analysis) Automated Dynamic Pricing (analysis) • Since DF was never seen as an attractor, tried applied a perturbation analysis to compare the other two using the method specified • Result: NIR would start becoming a strong strategy with a 6.75% improvement, and nearly dominant with 10% improvement.