* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download updated version for the 2015 Superbowl
Survey
Document related concepts
Transcript
Prisoners’ Dilemma Game Two criminals, Bill Belichick and Tom Brady, have been caught at the scene of a crime. Prosecutors separate them for questioning. Each can either confess that the act was planned, or deny that it was planned (i.e., claim that it was simply a coincidence). The prosecutor tells them that if they both confess they will each receive a six-month jail sentence (i.e., a six-month suspension). If one of the two confesses, the one who confessed will go free, but the one denying the crime will receive a two-year jail sentence (i.e., a two-year suspension). If both deny they will each receive 30 day sentences for the crime of inadvertently violating NFL rules. The situation is represented in the following payoff matrix, with jail terms represented as the negative of the number of months of prison time (i.e., months of suspension). Tom Bill Confess Deny Confess (-6,-6) (-24,0) Deny (0,-24) (-1,-1) Payoffs: (Bill, Tom) First consider the situation of Bill. If Tom chooses “confess,” Bill should choose “confess.” This results in less time in jail (i.e., six months versus 24 months). Alternatively, if Tom chooses “deny,” Bill should choose “confess.” Again, choosing “confess” results in less jail time (i.e., zero months versus one month). Since Bill’s optimal strategy of “confess” is independent of the strategy chosen by Tom, we say Bill has a dominate strategy of “confess.” Definition: We say a player has a dominant strategy in a game if the strategy is best, independent of the strategies chosen by other players. Alternative, we say strategy A dominates strategy B for a player if the payoff from choosing strategy A is always higher than the payoff from choosing strategy B (i.e., no matter what strategies are chosen by the other players). If one strategy dominates all other strategies for a player, then we call it a dominant strategy. We can indicate Bill’s best responses in the table by underlining his corresponding payoff. Since both of his payoffs in the “confess” row are underlined, his optimal response of “confess” is independent of the strategy chosen by Tom. Tom Bill Confess Deny Confess (-6,-6) (-24,0) Payoffs: (Bill, Tom) Deny (0,-24) (-1,-1) Now consider the situation of Tom. If Bill chooses “confess,” Tom should choose “confess.” This results in less time in jail (i.e., six months versus 24 months). Alternatively, if Bill chooses “deny,” Tom should choose “confess.” Again, choosing “confess” results in less jail time (i.e., zero months versus one month). Since Tom’s optimal strategy of “confess” is independent of the strategy chosen by Bill, we say Tom has a dominate strategy of “confess.” We can indicate Tom’s best responses in the table by underlining his corresponding payoff. Since both of his payoffs in the “confess” column are underlined, his optimal response of “confess” is independent of the strategy chosen by Tom. Tom Bill Confess Deny Confess (-6,-6) (-24,0) Deny (0,-24) (-1,-1) Payoffs: (Bill, Tom) We can combine the last two tables showing best responses for Bill and Tom to determine a “mutual best response,” or Nash equilibrium of “confess-confess.” Tom Bill Confess Deny Confess (-6,-6) (-24,0) Deny (0,-24) (-1,-1) Payoffs: (Bill, Tom) Definition: A combination of strategies is a Nash (non-cooperative) equilibrium if each player’s strategy is best, given the strategies chosen by the other players. The Nash equilibrium is a “mutual best response” in the sense that each player is correctly assessing the strategies of all other players and choosing his or her best possible response. Definition: An allocation is Pareto Optimal if it is impossible to make one person better off with out making someone else worse off. The Nash equilibrium in this game is not Pareto Optimal. If Bill and Tom could agree to deny the crime they would both be better off. The strategy combination “deny-deny” is a Pareto Improvement over the Nash equilibrium. Note that if Bill and Tom did reach a collusive agreement to deny the crime, each would have an incentive to cheat on the agreement by choosing to confess.