Download MidtermSeanWayneRobinWayne

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Chicken (game) wikipedia , lookup

Mechanism design wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Cognitive model wikipedia , lookup

Transcript
The emergence of social
conventions in Dynamic
Networks in agents using the
Highest Rewarding
Neighborhood Rule
Sean Hallinan, Wayne Wu, Kenny Wong and Robin Jha
Advisor:
Zhang Yu, Ph.D.
Abstract
Simulation is a powerful tool which is effective for discovering surprising consequences
of simple assumptions. Social simulations are the class of simulations that is used to study social
behavior and phenomena. In the social conventions setting, agents interact with each other with
set rules, or games, and observe information about the system. In this paper, we investigate the
efficiency of the emergence of social conventions in complex networks by using the strategy
update function such as Highest common reward (HCR) coupled with Highest neighborhood rule
(HNR), that is, how fast conventions are reached.
The paper deals with making modifications to the existing mathematical model proposed
for the HNR rule. The existing model is a realistic model for social interactions with limitations
which will be dealt with in the new proposed model. The new model is then tested in different
domains such as cooperation and coordination games. The agents in this model use a greedy
algorithm. In the new model the agents will make their decisions based not only on the
immediate payoff but think about the rewards in the long run which has been discussed in later
part of the paper. The paper also talks about optimization of the existing HRN model and Jason’s
code along with the creation interfaces to make the code more user-friendly.
1.Introduction
Motivation:
Simulation is becoming an integral part of the social sciences to model realistic behaviors.
Agents are used to model the social entities such as people, groups, etc. These simulations can be
helpful in drawing conclusions about the corresponding real world entities and making informed
decisions. Applying human behavioral simulations can be used to understand social phenomena.
This can be used to understand the decisions made by people and make predictions about
behavioral output in complex situations. Most of the research to this point has basically been
dealing with creating agents which acts rationally. But studies in experimental economics has
revealed that human beings do not always make rational decisions and modeling realistic
behavior is still in the process of evolution.
When talking about rational agents and decision making process we have to talk about
decision theory. The idea of maximum expected utility (EUT) emerged from decision theory
according to which an agent should make decisions to maximize its reward. However, this idea
fails to capture the human-decision making methodology. Difficulty in producing social
simulations also stems from the fact that the social norms evolve and emerge as time passes.
Background and research goals:
An agent is any entity which can perceive the environment through sensors and act using
their actuators. Rational agent is an agent which makes decisions and performs actions to
maximize its utility at all times. However, the definition of rationality differs in a dynamic and
static network. In the dynamic network the agent needs to learn to keep its knowledge about the
environment updated and make informed decisions. Our work deals with making the decision
making process as realistic as possible and reproduce both realistic individual and social
behavior. This can be accomplished by combining the intuitive and deliberative decision making
process which results in an irrational and realistic behavior. We also attempt to deduce
conclusions on the emergence of social norms in dynamic environment which is accomplished
by investigating the emergence of such norms in static network followed by employing a
dynamic network. While it is true that many social conventions are stipulated by law, it is often
the case that these social conventions will emerge out of everybody’s willingness to maximize
their own utility. Our research contributes to the multi-agent learning by showing how agents in
a network can come to work together in order to maximize their utility. For our research we
examine the Highest Rewarding Neighborhood rule, put forth by Jason Leezer in [1]. The basic
idea behind the rule is that agents will prefer to engage in interactions which benefit them the
most. If an agent is engaged in a relationship which is not benefiting it, it breaks the relation and
finds another agent to enter into a relationship with. The Highest Rewarding Neighborhood rule
is still loosely defined. The main goal of this research will be to come up with a better model for
[1] so that the agent can become more intelligent. An approach for this would be to modify the
greedy algorithm used by him so that the agents can make decisions keeping in view the long
term benefits which will be more beneficial in the long run instead of going for something which
would give an immediate payoff but may not be a very good decision in the future.
2. Related Work
One advantage of the Highest Rewarding Neighborhood rule is that it implements
Q Learning. The rules defined in [2], [3], and [5] all update their strategy via some method of
imitation which is. While it may be the case that humans do learn through imitation to some
extent, all most everybody will acknowledge that they are capable of making their own decisions
as to what’s best for them independent of what others are doing. Agents using the Highest
Rewarding Neighborhood rule choose their strategies in a way that encompasses this observation,
making them a better model of human behavior.
Other works have been able to show convergence the Generalized Simple Majority and
of the Highest Cumulative Reward ([3] and [4] respectively), but these proofs assume a static
network. In the case of Generalized Simple Majority, the not so generalized assumption is made
that the ratio of agents using a particular strategy in each agent’s neighborhood is approximately
to the ratio of agents using a particular strategy in the entire network. While this is certainly true
in complete networks, it is not generally true, even in graphs where the number of connections is
fairly close to that of a complete graph.
Not much research has been done in the field of network update functions, and so finding
models to compare our results against is tough. One model that was found, described in [6] was
defined rigidly only for the prisoners dilemma, and so it is one potential “opponent” for the
Highest Rewarding Neighborhood model only for that game. It also assumes that the agent has
knowledge of its neighbor’s payouts. These issues of domain and restrictions on the agent’s
knowledge of the environment will certainly be an issue for any other model that we try to
compare against, but our hope is to find more models for comparison as the research continues.
With dynamic networks, maintaining the network structure is important. In [1] and [6] the point
is made that any network update function cannot determine who to connect to, only who to
disconnect from.
3. Mathematical Definition
In this section we include an insert with a rigid definition of Q-Learning and the Highest
Rewarding Neighborhood rule. This will be the model the will be analyzed in the future, and the
model on which simulations are run.
[Refer to PDF](we were unable to integrate it effectively)
4. Generalized Highest Rewarding Neighborhood Rule
5. Games
Prisoner’s Dilemma
For the purposes of this research we will define the Prisoner’s Dilemma to be a game
with the payout matrix:
Notice that in the Prisoner’s Dilemma mutual defection is a Nash Equilibrium, but it is
the only Pareto inferior interaction. Also note that mutual cooperation is the optimal solution, but
doesn’t produce the greatest reward for the individual. It is a good analogy for many social
interactions where cooperation provides the most benefit for the group as a whole but where
defection provides the safest bet and the greatest potential for reward.
Stag Hunt
For the purposes of this research we will define Stag Hunt to be a game with the payout
matrix:
A
B
Cooperate
Defect
Cooperate
4,4
1,3
Defect
3,1
3,3
Cooperate
Defect
Cooperate
3,3
0,5
Defect
5,0
1,1
A
B
Notice that in the Stag Hunt both mutual defection and mutual
cooperation are Nash Equalibria, but mutual cooperation is the optimal
interaction. Defection however is the risk dominant strategy, leading
players who do not trust each other into a Pareto inferior interaction.
-1,-1
1,1
Also note that mutual
Cooperate
Defect
cooperation is the only
Cooperate
2,2
3,1
Pareto optimal solution.
Stag Hunt is a good
analogy for interactions
where trust is necessary
for the best outcome.
Defect
1,3
0,0
1,1
-1,-1
Coordination Game
For the purposes of this research we will define the Coordination Game to be a game with
the payout matrix:
In the coordination game agents benefit from playing the same strategy. On a system
wide level this leads the agents to form coalitions, and so it’s often used to model the spread of
conflicting philosophies, such as the spread of democracy and communism during the cold war.
Chicken
For the purposes of this research we will define Chicken to be a game with the payout
matrix:
Notice that in this game there are two Nash Equalibria, DC and CD. While all 3 Pareto optimal
strategies are optimal, two risk seeking agents trying to maximize their own utility would end up
both defecting, leading them into a Pareto inferior interaction. However two risk averse agents
would end up cooperating. Thus a system correctly modeling the behavior of the risk averse
human being should see the emergence of some degree of mutual cooperation as a social
convention.
6.Interface
An interface creates a protocol that classes may implement. One can extend an interface
just as we can extend a class. One can actually extend several interfaces. Interfaces thus enjoy
the benefits of multiple inheritances. An interface is a collection of method definitions (without
implementations) and constant values. We use interfaces to define a protocol of behavior that can
be implemented by any class anywhere in the class hierarchy.
Interfaces are useful for:



capturing similarities between unrelated classes without forcing a class relationship
declaring methods that one or more classes are expected to implement
revealing an object's programming interface without revealing its class (objects such
as these are called anonymous objects and can be useful when shipping a package of
classes to other developers)
The interface makes it possible for a method in one class to invoke methods on objects of other
classes, without the requirement to know the true class of those objects, provided that those
objects are instantiated from classes that implement one or more specified interfaces. In other
words, objects of classes that implement specified interfaces can be passed into the methods of
other objects as the generic type object, and the methods of the other object can invoke methods
on the incoming objects by first casting them as the interface type. This provides a significant
degree of generality in the programming capability considering situation where other members of
the team might be working on the same problem. We implemented an interface for code used for
[1] so that the users can choose the learning algorithm. Depending on the input on the command
line the user can make the agent use learning algorithm such as q-learning, neural-nets or genetic
algorithm. However, at the moment only the q-learning had been implemented.
7. Memory
In Shoham and Tennenholtz’s experiment, they modified the memory parameters of each
agent in the environment [4]. In our project, we have included parameters for memory in the
agents (update frequency and restart frequency). For example, the behavior of update frequency
could be varied by internal limitations of the system, or could be selected voluntarily to impose
greater stability on the system, making the update frequency an important part of an agent’s
functionality. Regarding restart frequency, Shoham and Tennenholtz ran an experiment in which
memory restarted their memory always and only after changing their strategy [4]. Interestingly,
this convention was even more efficient that in the case of full memory. Therefore, I added the
memory parameters in order to mimic the experiment to compare the results. Although I do not
believe this will be the case for our HRN rule, as the HRN rule functions differently than
Shoham and Tennenholtz’s Simple Majority Rule, it is still interesting to test the convergence of
the environment of HRN agents against an environment of SM agents; we are using memory as
the adjusting variable because of how important memory is to an agent. I have finished the
insertion and modification of the code to include the memory parameters, but I have yet to test
for results. Shoham and Tennenholtz ran the experiment using the coordination game with 4000
trials of 800 time steps each. I will accumulate results to compare with Shoham and Tennenholtz
over the next couple of weeks.
8.Optimization
Most of the optimization done on the Highest Rewarding Neighborhood package is done
in the Population class regarding each network chooser. We improved the methods which helps
each agent collect data faster. For example, in Small World Chooser, we noticed that each agent
would generate a list of connections for all agents in the environment. This is unnecessary
because only mutual neighbors are important to the list of connections. For example, there is no
need to include the neighbor that is on the opposite side of the small world when it is unlikely to
connect to an agent there. By modifying the list to only include connections the agent with
mutual neighbors, we cut down the list to the density of environment agent clustering. This leads
to faster iterations to generate tables regarding the agent’s connection.
Future Work:
The HRN rule uses greedy-algorithm to determine the connections it wants to make or
break. We will make changes to the existing code so that the user will have an option to change
the tolerance of the interacting agents. This will affect the way agents will make decisions about
severing or making connections if the neighboring agents change their strategy. We will also
make changes to the way the exploratory function is being used in Q-learning. After the changes
the Q-learning will have options to choose from other functions including the exploratory
function.
As written above, we will continue to accumulate results on the memory experiment to
compare with Shoham and Tennenholtz’s results. Also, I will produce visual models of the social
networks for every time step that the user chooses. Currently, once a test is completed its run,
all of the data (number of cooperating/defecting agents, average reward, etc.) in the test run
goes into an excel file. However, since an excel sheet can only crunch numbers, it is impossible
for the excel program to visually model the connections between agents in the system.
Therefore, I will incorporate a new way (software such as Organization Risk Analyzer (ORA) and
Pajek) to visually reproduce the social network used. However, ORA works only on the
Windows machine, so it would be difficult to transfer between Linux and Windows. Therefore, I
need to find a way to work ORA in Linux; a good reference to start with is the creators of ORA
[10]. Another task I will try to accomplish is the continual progress of Small World and Scalefree Chooser. Small World is now compatible with the new features included in the Xml files
(mainly agent symmetry), but still contains a small bug regarding agent connection probability,
but that is something minor. I need to make Scale-free compatible with the Xml files as well.
8. Bibliography
[1] Leezer, Jason. "Simulating Realistic Social and Individual Behavior in Agent Societies." Thesis. Trinity
University, 2009.
[2] Walker, Adam, and Michael Wooldridge. "Understanding the Emergence of Conventions in MultiAgent Systems."
[3] Delgado, Jordi. "Emergence of social conventions in complex networks." Elsevier (2002).
[4] Shoham, Yoav, and Moshe Tennenholtz. "On the emergence of social conventions: modeling,
analysis, and simulations." Elsevier(1997).
[5] Zimmermann, Martin G., and Victor M. Eguiluz. "Cooperation, Social Networks and the Emergence
of Leadership in a Prisoners Dilemma with Adaptive Local Interactions."
[6] Poundstone, W. (1992) Prisoner's Dilemma Doubleday, NY NY.
[7] Binmore, Ken. Game Theory A Very Short Introduction (Very Short Introductions). New
York: Oxford UP, USA, 2007.
[8] Russel, Stuart, and Peter Norvig. Artificial Inteligence. New Jersey: Pearson Education, 2003.