Download Action-Selection Biased by Pleasure Regulated Simulated Interaction

Computational Aspects of Emotion in Adaptive Behavior Joost Broekens, Walter Kosters, Fons Verbeek LIACS, Leiden University, The Netherlands. Joost Broekens, LIACS, Leiden University, The Netherlands. Overview • Emotion & Information Processing. • Adaptive agents: – reactive, – cognitive, – emotion-modulated cognitive agents. • Experiment: Pleasure regulates information processing. • Future work. Joost Broekens, LIACS, Leiden University, The Netherlands. Emotion: communication medium, decision heuristic and modulator. • Common emotions: fear, anger, happiness, sadness, surprise, disgust. • Short episode triggered by an (internal/external) event composed of – – – – – subjective feelings, inclinations to act (action preparation, action tendency (Frijda)), facial expressions, cognitive evaluation, and physiological arousal (heartbeat, alertness). • Emotion: communication medium. – Communicate internal state (Biological & Sociological evidence: Darwin, Ekman). • Emotion: decision-heuristic relating events to goals, needs, desires, beliefs of an agent. – Result of evaluation of personal relevance, helps decision-making (Neurological & cognitive evidence: Damasio, appraisal theory). • Emotion: influences information processing. – Neurocomputational & cognitive evidence: Doya and Frijda, Manstead and Bem. Joost Broekens, LIACS, Leiden University, The Netherlands. Emotion & Information Processing • BiologyEmotion; internal drives, homeostasis, hardwired reactions • CognitionEmotion; cognitive emotion elicitation: – Emotions result from the interpretation of our world in relation to our goals, needs, desires, beliefs, etc. (Appraisal Theory, Frijda, Lazarus, Arnolds, etc.). • Emotionbehavior; emotion influences adaptive behavior: – – – – emotion as drive, emotion as source of information, emotion as modulator of cognitive processes. Relates to different types of (views on aspects of) adaptive agents: • reactive, • cognitive, • emotion-modulated cognitive agents. Joost Broekens, LIACS, Leiden University, The Netherlands. Emotions and reactive agents • Reactive agents: – have predefined behaviors, – learn new behavior based on instrumental conditioning, and – select behaviors based on this learned model and based on internal drives (motivations). • Emotion influences behavior: – can be such an internal drive, and – can trigger typical behaviors (fight / flight). • Computational models that study emotion within this context (drive/motivation) (Avila-Garcia and Cãnamero, 2004; Cãnamero, 1997; Velasquez, 1998). Joost Broekens, LIACS, Leiden University, The Netherlands. Emotion and cognitive agents • Cognitive agents are reactive agents plus: – – – – Internally represented knowledge used in planning and reasoning, and an Attention mechanism guiding perception and action, etc... • Emotion influences behavior: – is a source of (explicit) information used in reasoning (knowledge), and – can (implicitly) modulate information processing (systemic influence). • Computational models in which emotion is used as information (e.g. Botelho and Coelho). Joost Broekens, LIACS, Leiden University, The Netherlands. Thinking: Internal Simulation of Behavior • Internal simulation of behavior – Covertly execute and evaluate potential interaction using sensory-motor substrates (Hesslow, 2002; Damasio; Cotterill, 2001), but see also – “interaction potentialities” (Bickhard), and – “state anticipation” (Butz, Sigaud, Gérard, 2003). – Existing mechanisms are basis for simulation – Evolutionary continuity! • Our basis for information processing Joost Broekens, LIACS, Leiden University, The Netherlands. Emotion modulates information processing • • Emotion influences thinking and behavior at multiple levels of cognitive complexity (Frijda, Manstead and Bem, 2000; Damasio, 1994; Davidson, 2000; Berridge, 2003; Rolls, 2000). Emotion is integrated at multiple levels of processing & higher levels of processingconscious, reflective reasoningnot always existed  evolutionary advantage to integration of emotion at lower levels can be expected; levels close to reward systems, and behavioral control. – If thinking is internal simulation of behavior, these low-level integration mechanisms should also learn us about the influence of emotion on higher-level cognitive mechanisms, e.g., on attention. • • • In this research we focus on the low-level influence of emotion on information processing in simulated adaptive agents. We use emotion as a metalearning parameter (Doya, 2000). Emotion: pleasure and arousal (Russell, 2003). Joost Broekens, LIACS, Leiden University, The Netherlands. Experiment: Can pleasure regulate information processing such that this provides an adaptive advantage for the agent? Joost Broekens, LIACS, Leiden University, The Netherlands. Pleasure regulates information processing Cognitive influence simulated reinforcement simulated interaction pleasure Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Learning • The agent learns to interact with the environment through Reinforcement Learning (instrumental conditioning). – Agent’s actions are rewarded or punished. – Learns value-state predictions of potential next states. – Uses these predictions to determine what next action to do. – Basics of the model are based on (Sutton and Barto, 1998). • Learns through continuous interaction. • Learns based on perception-action pairs. Joost Broekens, LIACS, Leiden University, The Netherlands. Learning: reinforcement example Reward: propagate back to beginning, using a mechanism that solves the temporal credit assignment problem (i.e., find actions responsible for reward). Joost Broekens, LIACS, Leiden University, The Netherlands. Action-Selection Cognitive influence simulated reinforcement simulated interaction pleasure Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept Distributed-state RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Action-Selection • Value-state predictions are transformed into action-values. • Action-selection is based on these action values. – Choose an action from the set of action-value pairs stochastically (e.g. using a Boltzmann distribution) • Action-selection responsible for exploration vs. exploitation behavior. Joost Broekens, LIACS, Leiden University, The Netherlands. Our agent’s cognitive part (based on internal simulation of behavior) Cognitive influence simulated reinforcement simulated interaction pleasure Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept Distributed-state RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Simulation: action-selection bias At every step, instead of action-selection, select a subset of predicted interactions from reinforcement learning model  feed back to RL model. 1. Interaction-selection: select a subset of predicted interactions. 2. Simulate-and-bias-predicted-benefit: feed back to model as if a real interaction. 3. Action-selection: select the next action using the action-selection mechanism explained earlier based on the now biased action values. Cognitive influence simulated reinforcement simulated interaction pleasure Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept Hierarchical-state RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Simulation: example • Action list before simulation (!hypothetical example!): – {up=0.2, down=-0.5, right=-1, left=-1} • Action-selection would have selected “up”, – With Boltzmann high probability for “up”. • Simulate all interactions. Roadblock r=-.5 – Propagate back the predicted values by simulating interaction with environment. – Effect is a “value look-ahead” of 1 step. • Action list after simulation: – {up=0.1, down=0.5, right=-1, left=-1} • Action-selection selects “down”. • In this example simulating all predicted interactions helps . Joost Broekens, LIACS, Leiden University, The Netherlands. But: Simulating Everything is not Always Best • Even apart from fact that simulating everything costs mental effort. • Earlier experiments (Broekens, 2005) showed that – simulation has benefit, especially when many interactions are simulated. This is not surprising (better heuristic). However, – in some cases less simulation resulted in better learning.  Dynamic relation between environment and simulation “strategy” (i.e. simulation threshold: percentage of all predicted interactions to be simulated).  Emotion as metalearning to adapt amount of internal simulation? (Doya, 2002) – Pleasure is an indication of the current performance of the agent (Clore and Gasper, 2000). Also, – high pleasure top down thinking, and low pleasure bottom up thinking (Fiedler and Bless, 2000). Joost Broekens, LIACS, Leiden University, The Netherlands. Pleasure Modulates Simulation Cognitive influence simulated reinforcement simulated interaction pleasure Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept Distributed-state RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Pleasure Modulates Simulation • Many theories of emotion. • We use core-affect (or activation-valence) theory of emotion as basis. – Two fundamental factors, pleasure and arousal (Russell, 2003). – Pleasure relates to emotional valence, and – arousal relates to action-readiness, or activity. • In this study we model pleasure as simulation threshold. – We use pleasure to dynamically adapt the amount of interactions that are simulated. It is thus used as a dynamic simulation threshold. – We study the indirect effect of emotion as a metalearning parameter affecting information processing that on its turn influences action-selection. Joost Broekens, LIACS, Leiden University, The Netherlands. Pleasure Modulates Simulation • Pleasure quantification: indication of current performance relative to what the agent is used to. – Tried to capture this by the normalized difference between the short term average reinforcement signal and the long term average reinforcement signal: • Continuous pleasure feedback: e p  (r star  (r ltar  f ltar )) 2 f ltar – High pleasure, going well? Continue strategy, goal directed thinking. This is the only formula in the presentation! • > ep, high threshold, simulate predicted best interactions, – Low pleasure? Look broader, pay more attention to all predicted interactions. • < ep, low threshold, simulate many interactions. Cognitive influence simulated reinforcement simulated interaction pleasure, ep Interaction-selection Emotion process interaction predicted interactions action Reactive behavior Perception percept Hierarchical-state RL model Action-selection reinforcement stimulus ENVIRONMENT Joost Broekens, LIACS, Leiden University, The Netherlands. Experimental setup • To measure adaptive effect of pleasure-modulated simulation: force agent to adapt to new task. – First the agent has 128 trials to learn task 1, then – switch environment to new task, 128 trials to learn task 2. – Repeat for many different parameter settings (e.g. the window of the long and short term average reinforcement signals, the learning rate, etc…) • Pleasure predictions: – – – – Pleasure increases to value near 1 (agent gets better at task) then slowly converges down to .5. (agent gets used to task) At switch: pleasure drops, (new task, drop in performance) then increases to value near 1, and converges down to .5 (agent gets used to new task) Joost Broekens, LIACS, Leiden University, The Netherlands. Results • Performance of pleasure-modulated simulation is comparable with simulating ALL / Best 50% predicted interactions (static simulation threshold), but, using only 30% / 70% of the mental resources. Joost Broekens, LIACS, Leiden University, The Netherlands. Results • Some settings even have a significantly better performance at lower mental cost. • Predicted pleasure curve was confirmed Joost Broekens, LIACS, Leiden University, The Netherlands. Some conclusions • Can pleasure regulate information processing such that this provides an adaptive advantage for the agent? – Yes. • Simple pleasure feedback can be used to determine how broad an agent should internally simulate potential behavior. – Agent’s performance is comparable and mental effort decreases. – Since we introduce few new mechanism for simulation results are relevant to the understanding of the evolutionary plausibility of the simulation hypothesis, as increased individual adaptation at lower cost is an evolutionary advantageous feature. • Our results provide clues of a relation between the simulation hypothesis and emotion theory. Joost Broekens, LIACS, Leiden University, The Netherlands. Future work. • Use emotion to modulate: – action-selection distribution (Doya, 2002), and – interaction-selection distribution (e.g. temperature of Boltzmann, threshold of our AS mechanism). • Interplay between covert interaction (simulation) and overt interaction (action-selection). – Simulate the best interaction, but chose an action stochastically, see also (Gadanho, 2003):  Gives extra “drive” to certain actions. – The inverse? Seems rational too:  Simulate bad actions for “mental (covert) exploration”, choose best actions for “overt exploitation”.  Early experiments do not (yet) show clear benefit. • Use arousal factor as feed-back • Could arousal modify amount of energy available for information processing, and thereby provide a bound for the amount of simulation? • Arousal resulting from low-level evaluation of familiarity and suddenness (e.g. Scherer). Joost Broekens, LIACS, Leiden University, The Netherlands. Questions?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Action-Selection Biased by Pleasure Regulated Simulated Interaction