Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The emergence of social conventions in Dynamic Networks in agents using the Highest Rewarding Neighborhood Rule Sean Hallinan, Wayne Wu, Kenny Wong and Robin Jha Advisor: Zhang Yu, Ph.D. Abstract Simulation is a powerful tool which is effective for discovering surprising consequences of simple assumptions. Social simulations are the class of simulations that is used to study social behavior and phenomena. In the social conventions setting, agents interact with each other with set rules, or games, and observe information about the system. In this paper, we investigate the efficiency of the emergence of social conventions in complex networks by using the strategy update function such as Highest common reward (HCR) coupled with Highest neighborhood rule (HNR), that is, how fast conventions are reached. The paper deals with making modifications to the existing mathematical model proposed for the HNR rule. The existing model is a realistic model for social interactions with limitations which will be dealt with in the new proposed model. The new model is then tested in different domains such as cooperation and coordination games. The agents in this model use a greedy algorithm. In the new model the agents will make their decisions based not only on the immediate payoff but think about the rewards in the long run which has been discussed in later part of the paper. The paper also talks about optimization of the existing HRN model and Jason’s code along with the creation interfaces to make the code more user-friendly. 1.Introduction Motivation: Simulation is becoming an integral part of the social sciences to model realistic behaviors. Agents are used to model the social entities such as people, groups, etc. These simulations can be helpful in drawing conclusions about the corresponding real world entities and making informed decisions. Applying human behavioral simulations can be used to understand social phenomena. This can be used to understand the decisions made by people and make predictions about behavioral output in complex situations. Most of the research to this point has basically been dealing with creating agents which acts rationally. But studies in experimental economics has revealed that human beings do not always make rational decisions and modeling realistic behavior is still in the process of evolution. When talking about rational agents and decision making process we have to talk about decision theory. The idea of maximum expected utility (EUT) emerged from decision theory according to which an agent should make decisions to maximize its reward. However, this idea fails to capture the human-decision making methodology. Difficulty in producing social simulations also stems from the fact that the social norms evolve and emerge as time passes. Background and research goals: An agent is any entity which can perceive the environment through sensors and act using their actuators. Rational agent is an agent which makes decisions and performs actions to maximize its utility at all times. However, the definition of rationality differs in a dynamic and static network. In the dynamic network the agent needs to learn to keep its knowledge about the environment updated and make informed decisions. Our work deals with making the decision making process as realistic as possible and reproduce both realistic individual and social behavior. This can be accomplished by combining the intuitive and deliberative decision making process which results in an irrational and realistic behavior. We also attempt to deduce conclusions on the emergence of social norms in dynamic environment which is accomplished by investigating the emergence of such norms in static network followed by employing a dynamic network. While it is true that many social conventions are stipulated by law, it is often the case that these social conventions will emerge out of everybody’s willingness to maximize their own utility. Our research contributes to the multi-agent learning by showing how agents in a network can come to work together in order to maximize their utility. For our research we examine the Highest Rewarding Neighborhood rule, put forth by Jason Leezer in [1]. The basic idea behind the rule is that agents will prefer to engage in interactions which benefit them the most. If an agent is engaged in a relationship which is not benefiting it, it breaks the relation and finds another agent to enter into a relationship with. The Highest Rewarding Neighborhood rule is still loosely defined. The main goal of this research will be to come up with a better model for [1] so that the agent can become more intelligent. An approach for this would be to modify the greedy algorithm used by him so that the agents can make decisions keeping in view the long term benefits which will be more beneficial in the long run instead of going for something which would give an immediate payoff but may not be a very good decision in the future. 2. Related Work One advantage of the Highest Rewarding Neighborhood rule is that it implements Q Learning. The rules defined in [2], [3], and [5] all update their strategy via some method of imitation which is. While it may be the case that humans do learn through imitation to some extent, all most everybody will acknowledge that they are capable of making their own decisions as to what’s best for them independent of what others are doing. Agents using the Highest Rewarding Neighborhood rule choose their strategies in a way that encompasses this observation, making them a better model of human behavior. Other works have been able to show convergence the Generalized Simple Majority and of the Highest Cumulative Reward ([3] and [4] respectively), but these proofs assume a static network. In the case of Generalized Simple Majority, the not so generalized assumption is made that the ratio of agents using a particular strategy in each agent’s neighborhood is approximately to the ratio of agents using a particular strategy in the entire network. While this is certainly true in complete networks, it is not generally true, even in graphs where the number of connections is fairly close to that of a complete graph. Not much research has been done in the field of network update functions, and so finding models to compare our results against is tough. One model that was found, described in [6] was defined rigidly only for the prisoners dilemma, and so it is one potential “opponent” for the Highest Rewarding Neighborhood model only for that game. It also assumes that the agent has knowledge of its neighbor’s payouts. These issues of domain and restrictions on the agent’s knowledge of the environment will certainly be an issue for any other model that we try to compare against, but our hope is to find more models for comparison as the research continues. With dynamic networks, maintaining the network structure is important. In [1] and [6] the point is made that any network update function cannot determine who to connect to, only who to disconnect from. 3. Mathematical Definition In this section we include an insert with a rigid definition of Q-Learning and the Highest Rewarding Neighborhood rule. This will be the model the will be analyzed in the future, and the model on which simulations are run. [Refer to PDF](we were unable to integrate it effectively) 4. Generalized Highest Rewarding Neighborhood Rule 5. Games Prisoner’s Dilemma For the purposes of this research we will define the Prisoner’s Dilemma to be a game with the payout matrix: Notice that in the Prisoner’s Dilemma mutual defection is a Nash Equilibrium, but it is the only Pareto inferior interaction. Also note that mutual cooperation is the optimal solution, but doesn’t produce the greatest reward for the individual. It is a good analogy for many social interactions where cooperation provides the most benefit for the group as a whole but where defection provides the safest bet and the greatest potential for reward. Stag Hunt For the purposes of this research we will define Stag Hunt to be a game with the payout matrix: A B Cooperate Defect Cooperate 4,4 1,3 Defect 3,1 3,3 Cooperate Defect Cooperate 3,3 0,5 Defect 5,0 1,1 A B Notice that in the Stag Hunt both mutual defection and mutual cooperation are Nash Equalibria, but mutual cooperation is the optimal interaction. Defection however is the risk dominant strategy, leading players who do not trust each other into a Pareto inferior interaction. -1,-1 1,1 Also note that mutual Cooperate Defect cooperation is the only Cooperate 2,2 3,1 Pareto optimal solution. Stag Hunt is a good analogy for interactions where trust is necessary for the best outcome. Defect 1,3 0,0 1,1 -1,-1 Coordination Game For the purposes of this research we will define the Coordination Game to be a game with the payout matrix: In the coordination game agents benefit from playing the same strategy. On a system wide level this leads the agents to form coalitions, and so it’s often used to model the spread of conflicting philosophies, such as the spread of democracy and communism during the cold war. Chicken For the purposes of this research we will define Chicken to be a game with the payout matrix: Notice that in this game there are two Nash Equalibria, DC and CD. While all 3 Pareto optimal strategies are optimal, two risk seeking agents trying to maximize their own utility would end up both defecting, leading them into a Pareto inferior interaction. However two risk averse agents would end up cooperating. Thus a system correctly modeling the behavior of the risk averse human being should see the emergence of some degree of mutual cooperation as a social convention. 6.Interface An interface creates a protocol that classes may implement. One can extend an interface just as we can extend a class. One can actually extend several interfaces. Interfaces thus enjoy the benefits of multiple inheritances. An interface is a collection of method definitions (without implementations) and constant values. We use interfaces to define a protocol of behavior that can be implemented by any class anywhere in the class hierarchy. Interfaces are useful for: capturing similarities between unrelated classes without forcing a class relationship declaring methods that one or more classes are expected to implement revealing an object's programming interface without revealing its class (objects such as these are called anonymous objects and can be useful when shipping a package of classes to other developers) The interface makes it possible for a method in one class to invoke methods on objects of other classes, without the requirement to know the true class of those objects, provided that those objects are instantiated from classes that implement one or more specified interfaces. In other words, objects of classes that implement specified interfaces can be passed into the methods of other objects as the generic type object, and the methods of the other object can invoke methods on the incoming objects by first casting them as the interface type. This provides a significant degree of generality in the programming capability considering situation where other members of the team might be working on the same problem. We implemented an interface for code used for [1] so that the users can choose the learning algorithm. Depending on the input on the command line the user can make the agent use learning algorithm such as q-learning, neural-nets or genetic algorithm. However, at the moment only the q-learning had been implemented. 7. Memory In Shoham and Tennenholtz’s experiment, they modified the memory parameters of each agent in the environment [4]. In our project, we have included parameters for memory in the agents (update frequency and restart frequency). For example, the behavior of update frequency could be varied by internal limitations of the system, or could be selected voluntarily to impose greater stability on the system, making the update frequency an important part of an agent’s functionality. Regarding restart frequency, Shoham and Tennenholtz ran an experiment in which memory restarted their memory always and only after changing their strategy [4]. Interestingly, this convention was even more efficient that in the case of full memory. Therefore, I added the memory parameters in order to mimic the experiment to compare the results. Although I do not believe this will be the case for our HRN rule, as the HRN rule functions differently than Shoham and Tennenholtz’s Simple Majority Rule, it is still interesting to test the convergence of the environment of HRN agents against an environment of SM agents; we are using memory as the adjusting variable because of how important memory is to an agent. I have finished the insertion and modification of the code to include the memory parameters, but I have yet to test for results. Shoham and Tennenholtz ran the experiment using the coordination game with 4000 trials of 800 time steps each. I will accumulate results to compare with Shoham and Tennenholtz over the next couple of weeks. 8.Optimization Most of the optimization done on the Highest Rewarding Neighborhood package is done in the Population class regarding each network chooser. We improved the methods which helps each agent collect data faster. For example, in Small World Chooser, we noticed that each agent would generate a list of connections for all agents in the environment. This is unnecessary because only mutual neighbors are important to the list of connections. For example, there is no need to include the neighbor that is on the opposite side of the small world when it is unlikely to connect to an agent there. By modifying the list to only include connections the agent with mutual neighbors, we cut down the list to the density of environment agent clustering. This leads to faster iterations to generate tables regarding the agent’s connection. Future Work: The HRN rule uses greedy-algorithm to determine the connections it wants to make or break. We will make changes to the existing code so that the user will have an option to change the tolerance of the interacting agents. This will affect the way agents will make decisions about severing or making connections if the neighboring agents change their strategy. We will also make changes to the way the exploratory function is being used in Q-learning. After the changes the Q-learning will have options to choose from other functions including the exploratory function. As written above, we will continue to accumulate results on the memory experiment to compare with Shoham and Tennenholtz’s results. Also, I will produce visual models of the social networks for every time step that the user chooses. Currently, once a test is completed its run, all of the data (number of cooperating/defecting agents, average reward, etc.) in the test run goes into an excel file. However, since an excel sheet can only crunch numbers, it is impossible for the excel program to visually model the connections between agents in the system. Therefore, I will incorporate a new way (software such as Organization Risk Analyzer (ORA) and Pajek) to visually reproduce the social network used. However, ORA works only on the Windows machine, so it would be difficult to transfer between Linux and Windows. Therefore, I need to find a way to work ORA in Linux; a good reference to start with is the creators of ORA [10]. Another task I will try to accomplish is the continual progress of Small World and Scalefree Chooser. Small World is now compatible with the new features included in the Xml files (mainly agent symmetry), but still contains a small bug regarding agent connection probability, but that is something minor. I need to make Scale-free compatible with the Xml files as well. 8. Bibliography [1] Leezer, Jason. "Simulating Realistic Social and Individual Behavior in Agent Societies." Thesis. Trinity University, 2009. [2] Walker, Adam, and Michael Wooldridge. "Understanding the Emergence of Conventions in MultiAgent Systems." [3] Delgado, Jordi. "Emergence of social conventions in complex networks." Elsevier (2002). [4] Shoham, Yoav, and Moshe Tennenholtz. "On the emergence of social conventions: modeling, analysis, and simulations." Elsevier(1997). [5] Zimmermann, Martin G., and Victor M. Eguiluz. "Cooperation, Social Networks and the Emergence of Leadership in a Prisoners Dilemma with Adaptive Local Interactions." [6] Poundstone, W. (1992) Prisoner's Dilemma Doubleday, NY NY. [7] Binmore, Ken. Game Theory A Very Short Introduction (Very Short Introductions). New York: Oxford UP, USA, 2007. [8] Russel, Stuart, and Peter Norvig. Artificial Inteligence. New Jersey: Pearson Education, 2003.