Download PDF - Bentham Open

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural coding wikipedia , lookup

Development of the nervous system wikipedia , lookup

Donald O. Hebb wikipedia , lookup

Response priming wikipedia , lookup

Executive functions wikipedia , lookup

Emotion and memory wikipedia , lookup

Perceptual learning wikipedia , lookup

Neural engineering wikipedia , lookup

Rheobase wikipedia , lookup

Psychoneuroimmunology wikipedia , lookup

Artificial neural network wikipedia , lookup

Nervous system network models wikipedia , lookup

Allochiria wikipedia , lookup

Biological neuron model wikipedia , lookup

Machine learning wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Neuroethology wikipedia , lookup

Emotion perception wikipedia , lookup

Emotional lateralization wikipedia , lookup

Central pattern generator wikipedia , lookup

Pattern recognition wikipedia , lookup

Neural modeling fields wikipedia , lookup

Learning wikipedia , lookup

Metastability in the brain wikipedia , lookup

Eyeblink conditioning wikipedia , lookup

Convolutional neural network wikipedia , lookup

Psychological behaviorism wikipedia , lookup

Psychophysics wikipedia , lookup

Catastrophic interference wikipedia , lookup

Reinforcement wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Operant conditioning wikipedia , lookup

Recurrent neural network wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
28
The Open Cybernetics and Systemics Journal, 2007, 1, 28-46
EMOTION-I Model: A Biologically-Based Theoretical Framework for
Deriving Emotional Context of Sensation in Autonomous Control Systems
David Tam*
Department of Biological Sciences, University of North Texas, Denton, Texas 76203, USA
Abstract: A theoretical model for deriving the origin of emotional functions from first principles is introduced. The
model, called “Emotional Model Of the Theoretical Interpretations Of Neuroprocessing”, abbreviated as the “EMOTION”, derives how emotional context can be evolved from innate responses. It is based on a biological framework for
autonomous systems with minimal assumptions on the system or what emotion is. The first phase of the model (EMOTION-I) addresses the progressive abstraction of the sensory input signals within relevant context of the environment to
produce the appropriate output actions for survival. It uses a probabilistic feedforward and feedback neural network with
multiple adaptable gains, self-adaptive learning rate and modifiable connection weights to produce a self-organizing, selfadaptive system incorporating associative reinforcement learning rules for conditioning and fixation of circuitry into
hardwire to form innate responses such that contextual feel of sensation is evolved as an emergent property known as
emotional feel.
Keywords: Origin of emotions, autonomous control system, model of emotion, neural processing, contextual representation,
sensation.
INTRODUCTION
Emotion is one of the most studied subjects in many disciplines of science including psychology, physiology, philosophy, anthropology, etc., and recently in robotics. Yet, it
is one of the most controversial subjects because of the differences in definition, in perception, and in perspectives,
among other things such as introspection and retrospection
by human cognition. Rather than engaging in the debate of
whether emotion is unique to human, whether other animals
or robots could have emotions, this paper focuses on deriving
the emergent property called emotion from the basic principles required for processing autonomous control functions.
We will use an inter-disciplinary approach that includes
mathematics, neuroscience, physiology, psychology and engineering control science in the derivation of emotion. Due to
the volume of research on the topic, we will limit our discussion on relevant classical literature in the derivation of emotions.
EVOLUTIONARY APPROACH
This approach to study the emergence of emotion in a
self-actuating autonomous system is analogous to the approach to study how emotions are evolved in the biological
system at the theoretical level rather than studying it at the
phenomenological level. It is an evolutionary approach to
derive the necessity for the emergence of an entity called
“emotion” for processing sensory signals and its internal
functions in order to survive in the real world. It is based on
the computational principles needed for an autonomous robot
to function in the real world without any external guide or
control.
*Address correspondence to this author at the Department of Biological
Sciences, University of North Texas, Denton, Texas 76203, USA;
E-mail: [email protected]
1874-110X/07
We consider a self-actuating autonomous robot as an
animal (or organism) without any pre-programmed ability
(i.e., a priori knowledge) to interact with the physical world.
The task of this autonomous robot (or animal) is to derive its
own working principles to interact with the real world with a
minimal set of assumptions, and see how emotions are
evolved in this process as the necessary condition to survive
in the real world, without assuming what emotions are or
should be.
OPERATIONAL APPROACH
By using this approach of inquiry, we will bypass the
unavoidable debate on the human perception of what emotions are, what they are used for, whether they are unique to
humans, whether they exist in animals or robots, and any
other subjective perception of what emotions are, including
the debate of the subjective definitions of different emotions.
In other words, we will use an objective approach to study
this phenomenon (called emotion) without assuming its functional role in animals, humans or robots. Rather, we will
study the phenomenon of autonomous control in animals, and
observe what principles of operation are required to survive
in the real world. From these operational principles needed
for survival, we will identify which of these governing principles happen to correspond to the entity that people identify
as “emotion”.
ROBOTIC EMOTIONS
Emotions in autonomous robots have been implemented
recently in various systems, primarily as a study of autonomous behavior augmented by emotional controls [1-3]. Most
often, the autonomous robots mimic the human emotions as
an “add-on” to the unemotional cognition by introducing
emotion as part of the process-control function rather than
deriving emotional functions or exploring the origins of emotions in self-adaptive behavior. Although comparison be2007 Bentham Science Publishers Ltd.
EMOTION-I Model A Biologically-Based Theoretical Framework
tween the human cortical system and robots were made to
characterize whether robots have emotions [4], such comparison does not address the functional role played by emotions in self-adaptive autonomous control or independent
agent such as an animal.
CYBERNETIC APPROACH
We will derive a theoretical model of emotional functions
from the first principles for autonomous control. A cybernetic system is an autonomous system that captures the essence of most basic biological and higher cognitive functions
(including intellectual, emotional and mental functions). The
derivation of this emotional model is based on capturing the
minimal set of conditions that are fundamental to the survival
(and/or appropriate interactions with the environment) of
such autonomous systems.
REAL WORLD INTERACTIONS
It will be shown that within the framework of autonomous control, emotions emerge as natural phenomena in order for autonomous systems (animals or autonomous robots)
to function appropriately within context in a real world environment. This foundation is based on the thesis that autonomous systems are independent agents (organisms or robots)
that rely on an internal representation model of the external
world to function accordingly.
ROLE OF CENTRAL NERVOUS SYSTEM
The role of the nervous system (either central nervous
system (CNS) in animals or neural control system in robots)
is to provide an accurate abstract representation of the external environment internally. Most importantly, this internal
representation is not necessarily an exact replica of the external world, but an accurate contextual representation such that
the autonomous system can respond appropriately under any
given circumstances for survival and other operating functions.
One of the many schemes for capturing this abstract, contextual representation of the external environment is creating
an internal model of the external world by the nervous system.
INTERNAL MODEL OF EXTERNAL WORLD
One of the advantages of creating an internal model of the
external world is that it not only provides a contextual representation of the outside world, but also provides the prediction of how its future actions may have on the environment.
It is this predictive power of the internal model that provides
for what is known as “cognition” or “higher intelligence”.
We will show that emotions, within this framework, correspond to the feedforward and feedback variables of the
internal model used for assessing the accuracy of the model
and its actions. Thus, emotions, in this perspective, are not
necessarily unique to humans or animals, nor are they introspective constructs labeled/constructed by human to explain
some psychological phenomena.
The Open Cybernetics and Systemics Journal, 2007, Volume 1
29
EMOTION-I Model
This model is called the “Emotional Model Of the Theoretical Interpretations Of Neuroprocessing”, abbreviated by
the acronym as the “EMOTION” model. This paper focuses
on the first phase of development of the biological framework for this model: EMOTION-I. It addresses the emergence of the “feel” of sensation for increasing the chance of
survival as the first step in internal pre-processing of emotions. The subsequent paper [5] will focus on the second
phase in developing the minimal set of basic emotions for
this model: EMOTION-II. It addresses the emergence of a
metric for assessing the accuracy of the internal model. This
internal model congruency consistency-check is represented
by “emotion”.
AUTONOMOUS CONTROL SYSTEMS
An autonomous control system is a self-actuating system
capable of performing sensorimotor functions based on its
internal controls. Most often, it is capable of decision making
without external guide or control. Examples of autonomous
systems are animals and autonomous robots.
Biological organisms (animals, in particular) can be considered as autonomous control systems because they are capable of performing sensory and motor functions independent of an external agent. Autonomous robots can also be considered as autonomous systems since their sensorimotor
functions are controlled by their internal processors without
relying on any external control.
SOCIAL SYSTEMS
Self-actuating autonomous systems are self-contained
entities that operate independently. Although social systems
of these individuals may be dependent on each other for social interdependency (such as a school of fish, an army of
ants or a swamp of robots), the analysis of social interdependency is beyond the scope of this paper.
COMPONENTS OF AN AUTONOMOUS SYSTEM
Autonomous systems are self-contained entities that are
composed of systems of many interacting parts, including
sensory units, motor units and processing/controlling units.
Together, they form a system exhibited as an animal (in biological systems) or a robot (in robotic systems).
SURVIVABILITY AND APPROPRIATENESS OF ACTIONS
The task of the system is to integrate the sensory inputs
by the internal processing units to produce output actions that
are appropriate in the environment it lives in. The appropriateness of these output actions is determined by the accuracy
of the internal model that produces actions for the organism
to survive in its environment.
REFLEX AS A SIMPLE AUTONOMOUS SYSTEM
One of the simplest autonomous systems is the reflex
system, which is endowed with sensory input, motor output
30 The Open Cybernetics and Systemics Journal, 2007, Volume 1
David Tam
and its processing elements that associate sensory input with
motor output. The sensorimotor function is the minimal set of
functions to be considered as autonomous system.
STIMULUS-RESPONSE FUNCTON
Mathematically, the sensory stimulus is considered as the
input, x , which often encodes the intensity of the stimulus.
The motor response is considered as the output, y , which
often encodes the magnitude of the response. This sensorimotor function is sometimes called “stimulus-response function” (S-R function) in physiology.
The response in the physiological range is often a linear
function, since it approximates the linear response range.
Most reflexes operate at this linear region, although the
physiological range of some reflexes may be exponential or
logarithmic instead of linear when signal compression is required for efficient scaling. Pupil constriction reflex is an
example.
PHYSIOLOGICAL RANGE
For a simple linear reflex system, the I/O function is
given by:
INPUT/OUTPUT (I/O) FUNCTION
y(t) = a x(t) + b
This stimulus-response function also corresponds to the
mathematical input/output (I/O) function, f :
where a and b are constants.
y = f (x)
(1)
The input, x(t) , and output, y(t) , are often functions of
time, t , i.e., time-varying functions; thus, the I/O function,
f , becomes:
y(t) = f ( x(t))
(2)
Although, in general, the input and output can take on any
real number ( x, y ), it is advantageous to simplify the
subsequent derivation using the range of x and y that is
positive (i.e., x 0 and y 0 ), since negative values can
reverse the direction (sign) of the computed I/O function unintentionally.
Therefore, the task of an animal is to produce an appropriate I/O function such that the resulting action (motor output) will be an appropriate response in a given the environment (encoded by the sensory inputs).
REFLEX ACTION
Reflex is one of the simplest (most basic) sets of sensorimotor functions found in animals. The reflex-action found in
animals is that, given a stimulus of sensory input, the animal
is able to respond with a motor output independently without
any external control. The response in a reflex-action is usually stereotypical for a given stimulus. It usually provides a
physiologically appropriate response for the given stimulus
that enables the animal to respond rapidly without needing
higher-level processing. This usually increases the survivability of an animal by decreasing its response time.
I/O MAPPING
Mathematically, reflex is essentially an I/O mapping
function that maps the input into the output space. This mapping function (Eq. 1) is usually a simple function for reflexes.
It can be a linear function or a nonlinear function depending
on the specific reflex.
Most of the physiological reflexes are nonlinear functions
(often a sigmoidal function), in which there is a linear portion
in the middle called the “physiological range”, below which
the sensory signal is too small to be sensitive to respond to,
and above which the response is maxed out due to physical
limitations of the response system.
(3)
For a nonlinear reflex system, such as a sigmoidal response, the I/O function can be represented by:
y(t) =
1
1+ e
a x(t )
(4)
with an approximately linear response at the physiological
range.
LOOK-UP TABLE
Because these I/O functions are rather simple, which map
the input space into the output space by some straightforward
mapping functions (or look-up tables), these basic reflexes
are usually not considered as representing any “higherfunctions” such as emotion, cognition, perception or intelligence. Higher-level processing often requires a much more
complex I/O function, and it is often dependent on additional
parameters and conditions.
HIGHER-LEVEL MAPPING
We will derive the I/O function that leads to the emergence
of emotions from the additional parameters and conditions that
are beyond this basic set of simplistic reflex functions. Thus,
emotions and higher-cognition are an expansion of this basic
reflex I/O function that maps the input space into the output
space depending on other additional factors. The higher-level
processing/controlling functions, such as perception, cognition,
emotion and intellectual functions, are the additional attributes
of the system that allow them to perform above and beyond the
basic sensorimotor reflex functions.
PROBABILISTIC STOCHASTIC I/O FUNCTIONS
The I/O function for a reflex can also be either deterministic or non-deterministic (i.e., probabilistic). If it is deterministic, the exact response can be determined by the I/O
function, such as Eqs. 3 and 4. If the reflex is nondeterministic, Eq. 3 can be re-represented by a probabilistic
function:
y(t) = Prob( a x(t) + b)
(5)
which takes on the normalized value of [0,1].
PHYSIOLOGICAL NOISE
In the real world, any physical system is inherent with
noise, which provides the basis for a probabilistic system. Eq.
EMOTION-I Model A Biologically-Based Theoretical Framework
The Open Cybernetics and Systemics Journal, 2007, Volume 1
5 can be implemented equivalently by an addition of random
noise function:
y(t) = a x(t) + b + Noise( )
(6)
where Noise( ) can be any real valued random number
drawn from any random distribution with a variance of according to the implementation details. Similarly, the nondeterministic form of Eq. 4 can be represented by:
1
y(t) = Prob
a x(t ) 1+ e
(7)
1
1+ e
a x(t )
+ Noise( )
response is increased over time (over repeated trials), it is
called “sensitization” in physiology. When a reflex is sensitized to a stimulus, it leads to a larger response amplitude.
This is essentially an amplification of the response by increasing the scale of the I/O function.
That is, Eq. 1 can be modified by a scaling factor, c , (or
gain) such that the output becomes:
y = c f (x)
(9a)
where c > 1 for sensitization.
HABITUATION
or implemented by an addition of random noise function:
y(t) =
31
(8)
SOURCE OF NOISE
The source of noise may come from the sensory signal, the
transfer function or the output element. For simplicity, without
loss of generality, since most noises are additive, we will collapse these noise sources into a single noise term in the output
element, y(t) , in Eqs. 6 and 8. For non-additive noise, a separate noise term can be added to each of the sensory input function, transfer function, and motor output function.
EXPLORATION IN LEARNING
The advantage of using a probabilistic function instead of
a deterministic function is that it allows for variations in output response for the same stimulus. This variability is important in both learning and evolution, which require a trial-anderror approach to explore the unknown parameter space.
The probabilistic response essentially provides the variations (or randomizations) needed for exploration in learning
and in evolution. If the response is too deterministic, no
variations will result, and the animal will always produce the
same response as in a typical reflex, which is always the
same for a given stimulus intensity.
SELF-EXPLORATION
Self-exploratory adaptation/learning and evolution may
not occur without variability. Thus, probabilistic response
function is essential in self-adaptive systems whereby the
variability in output can be used as a feedback signal for
evaluating the adaptability of the system in response to different exploratory actions. In other words, it enables the system to explore the parameter space autonomously, similar to
applying the Monte Carlo simulation method but applying in
the real world in this case.
MODIFICATION OF REFLEX ACTIONS
Advanced behaviors (behavioral responses) often require
atypical responses instead of stereotypical responses. This
often requires modification of the response found in learning
and in evolution. Modification of the response implies changing the I/O function.
SENSITIZATION
Reflex can be modified to adapt to the environment, such
as increasing or decreasing the response amplitude. When the
When the response is decreased over time (over repeated
trials), it is called “habituation;” the reflex is habituated to
the stimulus, i.e., it becomes less sensitive. When the reflex
is habituated to the stimulus, it leads to a smaller response
amplitude. Thus, habituation is sometimes called “desensitization” in physiology. In other words, the output becomes
smaller:
y = c f (x)
(9b)
where c < 1 for habituation.
GAIN CONTROL
The increase or decrease in the scale factor, c , can also
be considered as changing the “gain” of a control system
such that the output is amplified or reduced for a given input.
Thus, this physiological adaptation is essentially a gain control for the system to respond. In other words, the I/O function of the reflex can be altered rather than fixed. Thus, the
system is an adaptive system in which the reflex can be altered by either sensitization or habituation.
ADAPTATION – SINGLE-STIMULUS LEARNING
Sensitization and habituation form the class of physiological learning called “adaptation”. It is the simplest form of
learning in which the response output of the reflex is either
increased or decreased when the stimulus is repeated over
many trials. Whether the response will increase (sensitize) or
decrease (desensitize) is dependent on the context of the
stimulus.
Note that adaptation is a type of learning that requires
only a single stimulus. We will show that other types of
learning will require two or three events to occur.
CONDITIONS FOR ADAPTATION
When the stimulus is potential harmful (noxious) to an
animal, and when this stimulus is repeated over time, the
adaptation often results in sensitization. When the stimulus is
potential harmless (non-noxious) to an animal, and when this
stimulus is repeated over time, the adaptation often results in
desensitization or habituation.
Noxious stimulus is often derived from a painful sensory
source, when pain serves as an alarm signal for an animal to
respond to, and become sensitized to that signal. (Although
we have not defined the emotional quantity called pain in our
derivation so far, we include it in our discussion here to convey the contextual meaning of a sensation, i.e., how this sen-
32 The Open Cybernetics and Systemics Journal, 2007, Volume 1
sory signal could be used in the I/O stimulus-response function. We will derive the emotion pain in the discussion later.)
PHYSIOLOGICAL APPROPRIATENESS AND CONTEXTUAL MEANING
These changes in response amplitude are physiologically
appropriate. Sensitization to the noxious stimulus allows the
animal to respond more readily to prevent potential harm. In
other words, this simple amplification of the reflex-response
allows the animal to predict the future better by making the
implicit assumption that amplifying the stimulus-response
I/O function could prevent future harm. Although this implicit assumption may not always apply in every unforeseen
situation, it is a pragmatic solution in most physiological circumstances.
Conversely, when the repeated stimulus is potentially
harmless (non-noxious), the animal would habituate to the
stimulus. This is also physiologically appropriate because
when the stimulus is non-noxious, the animal does not need
to respond as intensely to the same stimulus to save energy.
PHYSIOLOGICAL ADVANTAGES
There are many physiological advantages of habituation
to the non-noxious stimuli. It provides a minimization of the
energy-cost in producing the reflex-action. When an animal
is confronted with multiple stimuli that require coordination,
competition and interference of different stimulus-responses
could occur in a complex system. Reducing the response by
habituation can reduce the chance of interference of reflexactions when the animal responds to multiple stimuli simultaneously.
GILL-WITHDRAWAL REFLEX AS AN EXAMPLE
David Tam
feedforward control system does have its usefulness in
autonomous systems. It provides fast responses without needing the extra computational processing time overhead needed
in a feedback system. Furthermore, a feedback control system can be susceptible to instability, such as oscillations,
when the feedback signal is time-delayed. Thus, both feedback and feedforward systems do have their own advantages
and disadvantages; they serve different purposes in the design and evolution of an autonomous self-actuating, selfadaptive system.
CONTEXTUAL “FEEL” IN SENSATION
The alteration of reflex-action by sensitization or habituation is based on the implicit assumption that the animal is
able to project (predict) whether the stimulus is potentially
noxious or not. Although the animal may not be considered
as having any high-level conscious perception cognitively at
this stage, the prediction of potential harm is crucial to the
emergence of the contextual “feel” in sensation (i.e., the
emotional content of a sensation – whether it feels “pleasant”
or not).
EMOTIONAL “QUALITY” IN SENSATION
In other words, encoding merely the intensity of the
stimulus in the input-function, x(t) , is not sufficient to recognize whether the stimulus is potentially harmful or harmless, which in turn translated into whether it is pleasant or not
emotionally. The abstraction of the stimulus “quantity” into
stimulus “quality” is the first step in the emergence of emotional “feel” in sensation (sensory input).
PLEASANT/UNPLEASANT SENSATION
The gill-withdrawal reflex studied in Aplysia is a classic
example of the habituation (desensitization) of reflex in response to repeated non-noxious stimuli [6-8]. Intuitively, this
adaptation corresponds to the interpretation that since the sea
slug is constantly bombarded with stimuli from sea waves,
those harmless stimuli can be ignored if they do not represent
potential threats to the animal.
Thus, the abstraction of the stimulus in terms of potential
harm (harmfulness) requires the emergence of the contextual
feeling in sensation. Although cognition may not exist at this
low level of processing, nonetheless, the sensory stimulus is
no longer interpreted in isolation by the system. Rather, it is
interpreted based on the context in which the sensation is
received relative to the projected/predicted survivability of
the animal.
FEEDFORWARD CONTROL
EMOTIONAL CONTEXT AND SURVIVABILITY
In order to evaluate the “appropriateness” (or “survivability”) of the above self-adaptive responses, feedback and/or
feedforward control are often needed. The implicit assumption is that amplifying or reducing (sensitizing or habituating)
the reflex-response to increase the survivability is a feedforward prediction. In other words, it does not rely on the feedback of the response to correct for its action. It merely produces an output action, which “projects” that the outcomes
would be appropriate with an implicit assumption that it will,
regardless of whether it will or not.
Signal is hypothetically interpreted as unpleasant if it is
potentially noxious (or harmful) to the integrity/survivability
of the animal. The most unpleasant form of sensation would
become pain. Conversely, sensation is interpreted as pleasant
if it is preserving the survivability of the animal.
This pleasantness of sensation becomes one of the most
elemental forms of “contextual” sensation in the emotional
context. The “context” is the survivability of the animal, the
stimuli and environment in which it is interacting with.
FEEDBACK CONTROL
PHYSICAL SENSATION AND EMOTIONAL SENSATION
In contrast, a feedback system takes its current output
response (and other environmental factors) into account as
part of the input to evaluate the next response action, whereas
a feedforward system does not. So although a feedback control system may seem more appropriate in self-adaptation, a
In higher animals, such as mammals, emotional sensation
is interpreted and processed at the thalamic level [9]. On the
other hand, physical sensation that encodes the stimulus intensity (stimulus quantity) is encoded at the sensory receptorcell level.
EMOTION-I Model A Biologically-Based Theoretical Framework
PHYSICAL PAIN (HURT) AND EMOTIONAL PAIN
(SUFFERING)
Pain is a good example to describe the distinction between emotional sensation and physical sensation. There are
two distinct components of pain as perceived in higheranimals – the emotional pain called “suffering” and the
physical pain called “hurt” in sensation.
Hurt is the physical damage to the tissue, i.e., the stimulus
intensity encoded by action potentials in the first-order neurons (pain fibers). Suffering is the emotional context in which
the hurt is felt, i.e., how bad it feels – the quality of the sensation rather than the quantity of the sensation. The emotional
component of sensation is processed by the thalamic nuclei.
NEUROPHARMACOLOGICAL DISSOCIATION OF
PHYSICAL PAIN FROM EMOTIONAL PAIN
Physiologically, the emotional aspect and the physical
aspect of the same nociception (pain sensation) can be separated by dissociative anesthetics, such as ketamine and PCP
(phencyclidine), in which physical hurt can be felt without
the emotional suffering in higher-animals. In other words,
when an animal is under dissociative anesthetics, it can feel
the physical pain (hurt sensation) but that hurt does not
bother it emotionally, and is totally tolerable without feeling
any suffering emotionally associated with that hurt.
Thus, physical pain and emotional pain are two distinct
components of the same sensation for pain. The emotional
component can be separated from the physical component
physiologically by drugs, which demonstrates the existence
of the emotional context of a sensation (sensory stimulus) in
animals. This distinction between hurt and suffering (physical and emotional pain) can be separated physiologically and
neurologically; thus suffering is not merely a psychological
construct or subjective perception.
FEEDFORWARD MODEL PREDICTION
The emergence of contextual sensation is essential for the
survival of the animal, which can predict the potential outcomes of the sensory stimulus with respect to the animal’s
survivability. This context takes into the account of environment and integrity (survivability) of the animal involved – a
feedback control. It also presumes the ability to predict (or at
least project) the outcomes so that it can produce the physiologically appropriate output – a feedforward control.
CONTEXTUAL SENSATION
An animal can be considered as forming a conceptual
model of the world (or the environment in which it lives) and
itself to produce this prediction (projection) accurately. This
also implies forming an implicit “model” of the external
world and a “model” of its own internal world where the sensorimotor processing is done.
IMPLICIT MODEL
Although this contextual model may not be formed explicitly, nonetheless, it can be considered as having an implicit conceptual framework to model a rudimentary model of
the world around it. This model may not be a concrete model
The Open Cybernetics and Systemics Journal, 2007, Volume 1
33
or an explicit model, but a conceptual model such that the
behavior (motor output) is produced appropriately for a given
sensory input in the physiological context, even though the
animal may not even be considered as having any “concepts”, “perception” or “model” of its own at this elementary
stage of development because all these responses are merely
reflexive responses without any higher-level processing or
cognition.
MECHANISTIC STEREOTYPICAL RESPONSES
It is important to note that these reflexive actions are
mechanistic responses (because they are very stereotypical)
rather than cognitive emotional responses with higher-level
processing or recognition at the awareness level. Yet, this
implicit representation of contextual information provides the
conceptual framework for the derivation of emotional response from first principles in relation to the survivability of
an animal or any autonomous being.
ABSTRACTING SIGNIFICANCE OF SENSORY SIGNALS BY CONTEXT
The above analysis of the emergence of “emotional feel”
in sensation forms the basis for abstraction of sensory inputs
by context. In other words, sensory inputs are no longer
merely encodings of the stimuli intensity, which represent the
physical sensation. Rather, sensory inputs are processed in
such a way that it takes context into account to form an abstraction of the “significance” of the stimuli.
The significance of the sensory inputs is evaluated based
on the context in which the inputs are received and interpreted. In the above example, the significance of the sensory
inputs is evaluated based on the survivability of the animal.
Thus, the emotional feel takes on the significance of that sensation for instructing the animal how to respond appropriately if it were to increase its likelihood of survival in its environment.
OPERATIONAL DEFINITION OF EMOTIONAL SENSATION
The above analysis provides the theoretical basis for the
derivation of elementary emotions in sensation called “emotional feel” based on first principles rather than retrospection
in psychology or fact-of-evolution in biology. Although these
responses may be hardwired with some modifiability to some
extent, they are still simple reflexes without any complex
processing or cognition. They are merely simple reflexive
responses that are governed by simple equations represented
by the I/O functions similar to Eqs. 1-9.
Note that the emotional components in these I/O functions are implicit rather than explicitly represented. The implicit representation is the context in which it takes on in altering the response. This re-representation of the stimulusresponse can be considered as an elementary (first-level)
emotional response as far as sensation is concerned. It provides the qualitative emotional feel even though the animal
does not necessarily have any explicit emotions with respect
to these reflexes. Based on this framework, we will explore
the neural mechanisms for establishing this contextual sensation for an autonomous system.
34 The Open Cybernetics and Systemics Journal, 2007, Volume 1
CONDITIONED REFLEX
It is well known that the stimulus of a reflex can also be
“switched” to a different one other than the original stimulus.
In such case, the reflex is called “conditioned reflex” because
the response is altered by a conditioned stimulus.
For example, the eye-blink reflex is one of the classical
experiments in which the air-puff stimulus that induces the
eye-blink response can be switched over to a tone if a tone
stimulus is paired with the air-puff stimulus [10]. A rabbit
can learn (or be conditioned) to blink the eyes when the tone
is presented instead of an air-puff to the eye.
CLASSICAL CONDITIONING – TWO-STIMULI ASSOCIATIVE LEARNING
Classical (Pavlovian) conditioning (a well-known psychological phenomenon) is a mechanism in which two stimuli
are paired to establish the association between the stimuli and
response that were not established before. It requires two
stimuli instead of one stimulus as in adaptation discussed
earlier. The stimulus-response function of the original pair
(the innate unconditioned stimulus (US) and unconditioned
response (UR) pair) is transferred to the novel pair (conditioned stimulus (CS) and conditioned response (CR) pair).
Note that this type of learning requires two stimuli (US
and CS) to form the association. The end-result is that the
original I/O function between US and UR is changed such
that the new I/O function is established between CS and CR.
TRANSFER OF CONDITIONING STIMULUS
This transfer is established by pairing the presentation of
the unconditioned stimulus with the conditioned stimulus.
Thus, the difference between self-adaptation and conditioning is that adaptation requires only one stimulus whereas
conditioning requires two stimuli for the association. In other
words, the animal is able to establish new association between the novel stimulus and response.
ASSOCIATIVE LEARNING
This type of learning is often called “associative learning”
since it establishes association between stimuli and responses. In the above classical conditioned eye-blink reflex,
the US is the air-puff and the CS is the tone. They are paired
together to establish the subsequent association between CS
and CR. That is, presentation of tone will elicit an eye-blink
response after repeated pairing whereas such association did
not exist prior to the conditioning experiment (i.e., presentation of tone would not elicit an eye-blink prior to the training
phase).
TRANSFER OF ASSOCIATION FROM
STIMULUS TO A NOVEL STIMULUS
INNATE
Fig. (1) illustrates the transfer of US to CS to establish the
CS-CR stimulus-response function using a block diagram,
and the corresponding simplified neural circuitry. In the neural circuitry, the synaptic efficacy (connection weight
strength) for the CS-CR pair is zero before training whereas
the synaptic weight for CS-CR pair is increased subsequent
to training (repeated associative conditioning). The strength-
David Tam
ening of the connection weight is induced by the activation of
the US-UR pair, thus transferring the original US-UR stimulus-response function to the novel CS-CR stimulus-response
function.
CS
w’
US
w=1
CR
UR
Fig. (1). Schematic diagram of a simplified neural circuitry for
transferring the unconditioned stimulus (US) to conditioned stimulus (CS) via the modification of connection weight, w , at the CSCR synapse induced by the US-UR synapse (fixed connection
weight, w = 1)
MODIFIABLE REFLEX
This illustrates that reflexes are modifiable rather than
fixed or strictly hardwired. The original (innate) stimulusresponse I/O function can be altered such that the original
stimulus does not need to be presented to elicit a response.
This transfer of stimulus from the original innate form to a
novel stimulus can be very different in quality (i.e., from air
pressure to sound frequency in the above example). The
transferred stimulus can also be different in energy form, e.g.
from air-puff to light (from mechanical energy to photo energy) if the conditioning is paired between air-puff and lightonset stimuli, in which the animal will subsequently blink
whenever the light is turned on.
POSITIVE AND NEGATIVE REINFORCEMENT
Central to the neural mechanism of conditioning is the
reinforcement signal in which the response is reinforced.
There are two major classes of reinforcement – positive reinforcement and negative reinforcement.
Reward is considered as positive reinforcement whereas
punishment is considered as negative reinforcement. Positive
reinforcement often leads to affiliative (seeking/attractive)
behaviors whereas negative reinforcement often leads to
avoidance (repulsive) behaviors.
REINFORCEMENT LEARNING
Because of these characteristic responses, conditioning
paradigm is often called behavioral shaping. The end-goal to
be shaped is either seeking behavior (for positive reinforcement) or avoidance behavior (for negative reinforcement). In
either case, the neural mechanism in establishing such stimulus-response function is associative learning (or conditioning). Thus, this type of reinforced associative learning is
called “reinforcement learning” in neural network community whereas it is called “conditioning” in psychology.
REINFORCER
The transfer of US to CS stimuli to the CS-CR stimulusresponse function in the classical conditioning paradigm is
shaped by the reinforcement signal called the “reinforcer”.
EMOTION-I Model A Biologically-Based Theoretical Framework
Pairing with a positive reinforcer (reward) tends to promote
affiliation whereas pairing with negative reinforcer (punishment) tends to promote avoidance.
GOAL-DIRECTED LEARNING
Reinforcement learning requires a reinforcer to establish
the goal-directed behavior. The reinforcer is often the US,
but the reinforcer can be derived from an alternate source too,
as explained below. The direction of the end-goal is dependent on whether the reinforcer is a positive or negative reinforcement. The behavioral outcome of the animal (or
autonomous being) in this conditioning paradigm can be directed toward seeking-behavior or avoidance-behavior. Thus,
this type of reinforcement learning is sometimes called “goaldirected learning” [11-13].
Although many other neural network models were implemented with conditioning as the learning paradigm to
solve problems (such as [14]), those models often are not
addressing or solving problems related to emotions; whereas
the model introduced in this paper focuses on the basic principles of operation for establishing emotional context in sensation.
NEURAL NETWORK IMPLEMENTATION
Theoretical foundation of neural network has been established extensively to explain many high-level cognitive functions, such as learning and pattern recognition [15]. In brief,
the brain of an animal is essentially composed of many networks of neurons. By definition, neural network is essentially
a set of interconnecting neurons (neural elements). The function of a neural net is to process information collectively by
its neurons.
NETWORK CHARACTERISTICS
One of the characteristics of a neural net is that the overall I/O function processed by a neural net is performed by the
collective properties of many subsets of neurons rather than
strictly by each individual neuron. Although each neuron
does have its individual I/O function to be processed, the I/O
function of a network is often very different from the individual I/O function. In fact, the resulting properties exhibited
by a neural network’s processing may not be found in its
component (i.e., the neurons).
EMERGENT PROPERTY
This property exhibited by a neural network that cannot
be found in its component neurons is often called “emergent
property”. Examples of emergent properties of neural networks are learning and pattern recognition. Since most of the
neural I/O functions are nonlinear, the overall I/O function of
the network cannot be described by the linear sum of the I/O
function of each neuron; thus this allows the emergence of
processing properties from the network that are not found in
individual neurons.
MANY-TO-MANY MAPPING IN NEURONS
For neurons in a network with multiple connections, the
I/O function of Eq. 9 is not merely a one-to-one mapping, but
a many-to-many mapping mathematically. In other words,
The Open Cybernetics and Systemics Journal, 2007, Volume 1
35
the mapping is not merely scalar, but vector. This many-tomany mapping for a network can be represented by a vector
or matrix:
(10)
Y = c f (X )
CONNECTION WEIGHTS
One of the characteristics of neural networks is that the
neurons are interconnected with a connection weight, w ,
such that the individual inputs are scaled by the connection
weight. Furthermore, the connection weight, w , is modifiable such that it is adaptive over time, governed by a set of
learning rules.
SYNAPTIC STRENGTH
Biologically, the connection weights correspond to the
synaptic strengths of neurons. The synaptic strength can be
positive in excitatory synapse, and negative in inhibitory
synapse, and zero for a non-functioning synapse (or no connection between two neurons). The synaptic strength for biological neurons can also be modified, which is called synaptic plasticity in neurobiology.
MULTIPLE-GAIN CONTROL SYSTEM
This connection weight can also be considered as the
“gain” function in feedback control systems. Thus, the connection weight is essentially the gain in an adaptive control
system even though each input has its individual gain function rather than a single gain function in a typical adaptive
feedback control system. This multiple-gain adaptive control
system provides the essential mechanism for learning in neural network.
NEURAL NETWORK
The generalized Eq. 10 can be implemented as a function
of time, t , specifically by including the connection weight
gain matrix, w(t) , as follows:
Y (t) = f W (t), X (t)
(11)
or
(
)
(
)
y j (t) = f w ij (t), x i (t)
(12)
where x i represents the i -th input of the neuron, y j represents the j -th output of the neuron, and w ij (t) the connection
weight between i -th input and j -th output of the neuron (see
also Fig. (2)).
WEIGHTED-SUM
In most neural networks, a weighted-sum function is used
such that the output of a neuron is given by:
n
y j (t) = f w ij (t)x i (t) i=1
(13)
for a total of n inputs.
Thus, the output of a neuron is the weighted-sum of its
input, adjusted by the individual gain, w ij (or connection
weight between the i -th input and j -th output).
36 The Open Cybernetics and Systemics Journal, 2007, Volume 1
David Tam
THRESHOLDING FUNCTION
In most neural networks, the nonlinear I/O function, f () ,
is a thresholding function, either implemented as a hardthreshold (step-function) or a soft-threshold (sigmoidal function). For hard-threshold, a step function is often used:
y
y j (t) = max
y min
n
if
(w
) ij (t)x i (t)
(14)
i=1
otherwise
where denotes the threshold, for a neuron with n inputs,
and y max and y min are the corresponding high and low values of output, respectively. For a soft-threshold, a sigmoidal
function is often used:
1
y j (t) =
n
1+ e
( w ij (t )x i (t ))
(15)
i =1
The use of the sigmoidal function allows the above I/O
function differentiable in the minimization process mathematically.
form auto-association naturally, but also the association
mechanism is most biologically plausible.
Hebb [17] in 1949 proposed that if the pre-synaptic and
post-synaptic neurons are activated together, then the synaptic strength could be changed. This is essentially the associative learning rule where the input and output are correlated
together to change the connection weight. Hebbian associative learning rule has been applied in numerous neural network systems and in neurobiology. We will summarize the
Hebbian learning rule briefly below.
The Hebbian associative learning rule is given by:
w ij (t) = l y j (t) x i (t)
i, j
(16a)
and
l = l t
(16b)
where w ij is the incremental weight change at time t , l is
the learning coefficient (corresponding to the scale-factor
parameter for incremental weight change), and l is the learning-rate.
These nonlinear functions essentially provide the threshold for activating the output given the weighted-sum of the
inputs. In other words, individual gains are applied to each of
the inputs, which are then summed together to produce the
output set by the threshold.
The above equation satisfies the Hebbian rule because the
connection weight changes only when the input, x i (t) , and
output, y j (t) , are activated (i.e., non-zero). If either one is
zero (i.e., if either input or output is not activated), no weight
change occurs.
MULTI-GAIN ADAPTIVE CONTROL SYSTEM
If the above functions are expressed in discrete time-step,
t , the weight at the next time step, t + t , is given by:
In order for a control system to be adaptive, the variable
gains can be applied instead of using fixed gains. In other
words, the connection weights can be adjustable. Furthermore, the connection weights are essentially the individual
gain applied to each of the input of the system such that the
input signals are biased (amplified or attenuated) by the gain
or the connection weight. The difference is that there is individual gain for each of the input instead of the conventional
single gain signal that applies to the entire system in most
conventional control systems.
MODIFIABLE MULTI-GAIN SYSTEM AND LEARNING
Since the connection weights are modifiable (adaptable
gains), this phenomenon of modifiable synaptic efficacy in
biological neurons forms the biological substrate for learning
and memory [16].
The rules for modification of these connection weights
become the “learning rules” in neural networks, since by applying these rules to modify the connection weights, the neural net system as a whole can exhibit the emergent property
of “learned behavior”.
HEBBIAN ASSOCIATIVE LEARNING RULE
There are many learning rules commonly used in neural
network [15], such as associative learning rule (Hebbian
learning rule) [17], back-propagation learning rule [18], etc.
The most relevant learning rule in this context is the associative Hebbian learning rule. Not only does this Hebbian rule
w ij (t + t) = w ij (t) + w ij (t)
i, j
(17)
Alternatively, in engineering perspective, the connection
weights essentially provide the adjustable/adaptable gain
changes needed for associative learning such that it correlates
the specific input with the corresponding output by the biases
provided by the gains. Thus, Eqs. 14 - 17 form the set of
equations for Hebbian associative learning for individual
neurons.
NETWORK LAYERS
Neurons can be interconnected together to form a network. Without loss of generality, a network can be considered as neurons forming layers, from input layers to output
layers via some intermediate layers. Thus, the description of
the I/O function of a generic neuron at any given k -th layer is
given by:
Y k (t) = f W k (t) X k (t)
(18a)
(
)
or
y kj (t)
n
= f w ijk (t)x ik (t) i=1
(18b)
(see Fig. (2)), and the associative weight-change learning rule
is given by:
w ijk (t) = l y kj (t) x ik (t)
i, j,k
w ijk (t + t) = w ijk (t) + w ijk (t)
i, j,k
(19a)
(19b)
EMOTION-I Model A Biologically-Based Theoretical Framework
The Open Cybernetics and Systemics Journal, 2007, Volume 1
Thus, Eqs. 18 - 19 form the set of equations for Hebbian
associative learning for individual neurons at the k -th layer in
a network.
Note that the above equations specify the I/O function of
a neuron at the k -th layer from its i -th input to its j -th output
only (without explicitly specifying which layer the input
comes from or which layer the output goes to). Different
connectivity will provide different network architecture.
for a neuron at the k -th layer is the output from m -th to k
layer for a neuron at the m -th layer (see also Fig. (3)):
x imk (t) = y mk
j (t)
mkpair
37
-th
(22)
Fig. (3). Schematic diagram showing the network connectivity at
the k -th layer, connecting from the m -th layer to the n -th layer.
NORMALIZATION OF CONNECTION WEIGHTS
Fig. (2). Schematic diagram showing the neural input and output at
the k -th layer.
wijk (t) =
NETWORK ARCHITECTURE
In general, the most extensive network is a fully connected network with all-to-all connections. Alternatively, a
network can form layers, with either feedforward connections
or feedback connections or both. Furthermore, connections
can form layers not just to the adjacent layer, but also bypass
the adjacent layers. Thus, different network architecture exists that would provide different properties of processing. In
this paper, we provide a generalized theoretical foundation
for derivation of emotional context without restricting the
network architecture to any specific type.
CONNECTIONS BETWEEN LAYERS
Taking the specific connectivity into account, we will use
the superscript notation of mk to denote the connection from
m -th to k -th layer, and kn to denote the connection from k th
to n -th layer. The description of I/O function of a neuron at
the k -th layer connecting from m -th layer to n -th layer is
given by:
Y kn (t) = f W mk (t) X mk (t)
(20a)
(
)
or expanding it:
y kn
j (t) = f
w
mk
mk
ij (t)x i (t)
i
(20b)
(see also Fig. (3)) and the corresponding associative weightchange learning rule is given by:
mk
w ijmk (t) = l y kn
j (t) x i (t)
w ijmk (t
+ t) =
i, j,k,m, n
w ijmk (t) + w ijmk (t)
Since the connection weights can increase indefinitely
with each incremental time, t , normalization of these
weights can resolve this dilemma. One of the normalization
schemes is given by:
i, j,k,m, n
(21a)
(21b)
Note that since input and output are relative as far as any
neuron is concerned, so the input from the m -th to k -th layer
w ijk (t)
w
i, j, k
(23)
k
ij (t)
ijk
where wijk (t) is the normalized weight to be substituted in
the above equations. (For simplicity, and without loss of generality, we will use the notation for neurons at the k -th layer
without specifying the notation between layers from here on.)
CROSS-CORRELATION FUNCTION AND ASSOCIATIVE HEBBIAN LEARNING
The associative learning rule of Eq. 19 or 21, provides a
mechanism for correlating the input with the output. In fact,
with a time-delayed network architecture, it can be proved
that a time-delayed associative Hebbian learning network
essentially performs a mathematical cross-correlation function computation between input and output streams [19, 20].
Given that the significance of associative learning is performing a cross-correlation function, associative learning can
be interpreted as correlating the sensory inputs with its own
output actions to establish some significance (contextual
meaning) of the input-output pairs. When multiple inputs and
multiple outputs are included in this cross-correlation with a
nonlinear, multi-layered neural network architecture, the
emergent I/O relationship of the network can become contextual relative to the environmental context, i.e., sensory input
with respect to the system’s output.
ASSOCIATION BY CROSS-CORRELATION
The exact correlated context (extracted or abstracted by
the network) depends very much on the neural net architecture and interconnectivity, such as feedback or feedback networks, which we will discuss further later. We will continue
our discussion with how this mechanism for association (or
correlation) can be used to guide (or shape) the behavior, i.e.,
reinforce the animal (or autonomous robot) in such a way
38 The Open Cybernetics and Systemics Journal, 2007, Volume 1
David Tam
that the system will either seek or avoid the associated stimuli with its actions.
Because association between the input and output alone
does not necessarily provide the clue needed for behavioral
shaping of either seeking-behavior or avoidance-behavior,
additional stimulus (input) is required. This addition signal is
the reinforcement signal, i.e., positive reinforcement for reward and negative reinforcement for punishment.
ASSOCIATIVE REINFORCEMENT LEARNING
In order for an animal (or autonomous robot) to seek or
avoid certain stimuli for behavioral guidance, an additional
signal, z(t) , can be used as a reinforcer to change the connection weight in the associative learning rule of Eq. 19. The
associative reinforcement-learning rule that includes a reinforcer is given by:
w ijk (t) = l z(t) y kj (t) x ik (t)
i, j,k
(24)
where z(t) denotes the reinforcement signal at time t associated with input x ik (t) .
THREE-WISE CORRELATION
forced negatively, leading the animal to be more likely to
avoid taking actions toward the stimuli. Thus, the gain for the
weight change is negative in this case.
NEUTRAL ENVIRONMENT
If the reinforcement signal z(t) is zero, it means the
learning trial is not reinforced. In such case, from Eq. 24,
w ijk (t) = 0 , no learning occurs. This means the system remains constant as is, without changing when it is not reinforced. In other words, since the system is neither reinforced
positively or negatively, the environment is considered as
neutral, it will remain neutral in its learning.
LEARNING FROM LEARNING
Although it may seem intuitive to assume some baseline
learning to occur in a neutral environment (when the reinforcer signal z(t) is zero), the fact is, if learning were to occur (in our definition), it has to bias the system toward or
away from the end-target, which would imply either positively or negatively reinforced. Nonetheless, learning about
neutrality can occur at a higher-level processing.
Note that the above learning rule for weight change is
essentially a three-wise correlation, correlating between the
input, output and reinforcer. In comparison, the classical
Hebbian learning rule is a two-wise cross-correlation between input and output.
Learning about the neutrality of the setting can be obtained from a higher-level learning, which can be derived
from another super-set network on top of the current network
such that the inputs of this super-set network are derived
from the current front-end network (rather than derived from
the sensory input of the environment).
SENSORY INPUT AS REINFORCER
SUPER-SET NETWORK AND PRE-PROCESSOR
Note also that, in classical conditioning, the reinforcer,
z(t) , is derived from one of the sensory inputs, x ik (t) . In
other words, there is special significance in one of these inputs as far as the processing is concerned. In this case, one of
the inputs is treated as the reinforcer to shape the behavior.
Thus, this special input has profound implication in altering
the course of action for the network. (We will discuss how an
alternate reinforcer can be derived from other sources later.)
In many ways, abstraction of learning can be accomplished by generalizing this framework, forming networks of
networks to process higher-level abstraction of the output
from the pre-processor networks. This abstraction of processed output by super-set networks of networks essentially
forms the basis of emotional processing to guide its learning
or behavioral path.
REINFORCER AS GAIN CONTROL
The reinforcer, z(t) , can be delivered by an external
source, such as a “teacher”, in which case, this type of reinforcement corresponds to classical conditioning. In neural
networks, this type of learning is called “supervised learning”.
Note that the reinforcer, z(t) , is essentially another “gain
control” signal used for auto-adaptation (modification of
connection weight). In other words, the weights are changed
depending on not only the learning coefficient, l , but also the
size of the reinforcement (i.e., the gain), in addition to the
activation of both input and output of that neuron. The larger
the reinforcer signal, the bigger the gain is applied to the
weight change.
SIGNIFICANCE OF REINFORCER AND GAIN
If the reinforcement signal z(t) is positive, it will lead to
an increase in connection weight. This means the output action is more likely to be positive as a result, which means
acting toward the stimuli, i.e., reinforced positively. The
animal (or autonomous robot) will be more likely to move
toward the stimuli as a result. The gain for the weight change
is positive in this case.
Conversely, if the reinforcement signal is negative, it will
lead to a decrease in connection weight, resulting in rein-
SUPERVISED REINFORCEMET LEARNING
UNSUPERVISED REINFORCEMENT LEARNING
If the reinforcer is generated by the action of the system
(the animal or autonomous robot) itself, then this type of reinforcement corresponds to operant conditioning in psychology. The animal, thus, learn without a “teacher”, which can
be considered as self-learning or auto-associative learning.
This class of learning is also called “unsupervised learning”.
GOAL-DIRECTED LEARNING
Independent of whether the reinforcement signal is delivered by an external “teacher” or not, this class of learning is
often considered as “goal-directed” reinforcement learning
[11-13] since the reinforcer provides the learning cue in the
EMOTION-I Model A Biologically-Based Theoretical Framework
The Open Cybernetics and Systemics Journal, 2007, Volume 1
direction of the reinforced behavior, i.e., positively or negatively reinforced.
SOURCE OF REINFORCEMENT SIGNAL
The reinforcement signal, z(t) , in Eq. 24 can come from
many different sources. It can come from the sensory input,
such as food for animals or battery recharging signal for
autonomous robots for positive reinforcers, and pain signal
for negative reinforcer. These basic reinforcers are often innate signals that are hardwired into the system’s circuitry for
basic survival. These signals are also referred to as unconditioned stimulus (US) signals. (The mechanism for the formation of these innate responses will be addressed later in this
paper after the mechanisms for the transfer of internal reinforcements within the environmental context are introduced
with respect to the meta-system).
POSITIVE AND NEGATIVE REINFORCERS
Since the reinforcer (such as food or pain) is derived from
one of the sensory inputs in the first-layer, a positive reinforcer can be represented by:
z(t) = x1h (t)
(25)
if the reinforcer is the h -th input of the system representing
the US signal, and a negative reinforcer, z(t) , can be represented by:
z(t) = x1h (t)
(26)
Eqs. 25 and 26 can be combined to a single equation for
both positive and negative reinforcements. The reinforcer is
given by a generic form:
z(t) = r x1h (t)
(27)
where a reinforcement gain coefficient, r , is used to encapsulate either positive or negative reinforcement such that
r > 0 represents positive reinforcement, and r < 0 represents
negative reinforcement.
If the reinforcer is exactly the same as the h -th input of
the neuron without amplification or attenuation, then r = 1
for positive reinforcement, and r = 1 for negative reinforcement.
39
The transfer from the innate reinforcer (US), x1h (t) , (at
the h -th input in the first-layer) to the acquired reinforcer
(CS), x1m (t) , (at the m -th input in the first-layer) can be established by associative correlation using the similar associative
learning mechanism.
Once conditioning is established, the new CS reinforcer,
becomes the substituted reinforcer, giving the new
reinforcement learning equation for CS signal:
x1m (t) ,
w ijk (t) = l r x1m (t) y kj (t) x ik (t)
i, j,k,m
(29)
INTERNAL REINFORCER
By the same token, applying similar associative learning
paradigm for transfer of reinforcer signal from one input to
another, internal neural signals can be used as reinforcement
signals. In other words, the reinforcer does not need to be
originated from external sensory signals (first-layer input);
input in the internal-layer can be used as reinforcer.
Thus, the original innate reinforcer, x1h (t) , in the h -th
input at the first-layer can be substituted (or replaced) by an
acquired reinforcer, xmk (t) , in the m -th input at the k -th layer.
Eq. 29 can now be rewritten as:
w ijk (t) = l r x mk (t) y kj (t) x ik (t)
i, j,k,m
(30)
In other words, the system is able to derive its own reinforcer internally for associative reinforcement learning rather
than deriving from the external sensory source.
VIRTUAL REINFORCER
This internally derived reinforcer can be considered as a
virtual reinforcer (virtual reward or virtual punishment). The
virtual reinforcer can become a powerful mechanism for selfguided learning in motivating the animal (or autonomous
robot) to seek or avoid certain environment conditions represented (encoded) by the set of sensory stimuli without external reinforcer. In other words, contextual abstraction of the
sensory stimuli can be derived by such internal associative
representation for a given context.
REINFORCER AS AN IMPLICIT GUIDE
where the incremental size and direction of connection
weight-change are dependent on the learning coefficient, l ,
and reinforcer gain, r .
The above analysis illustrates that the reinforcer signal
plays a crucial role in determining the direction of the endgoal for a goal-directed behavior. In many ways, the reinforcement signal can be considered as the “switch” to guide
the behavior toward or away from the desired goal. Although
the “desired goal” may or may not be defined (or known) as
far as the animal is concerned, we use the term desired goal
in the theoretical sense to indicate what the system will eventually arrived at, given the conditioning paradigm.
TRANSFER OF REINFORCER
CONDITIONED FEEDBACK REINFORCEMENT
Other reinforcement signals can be derived from other
sensory inputs, such as the light signal paired with the innate
reinforcer in classical or operant conditioning. These signals
are often referred to as conditioned stimulus (CS), in which a
secondary sensory signal is used to derive as the reinforcement signal for associative learning.
The reinforcement-guiding signal can be derived from
either feedforward or feedback signal. Which signal it uses
depends on whether the system’s output (the animal’s response) is incorporated as the feedback reinforcement. If the
action of the animal results in an alteration of the reinforce-
ASSOCIATIVE REINFORCMENT LEARNING RULE
Applying this generalized reinforcer gain, r , to Eq. 24,
the learning rule for weight-change equation becomes:
w ijk (t) = l r x1h (t) y kj (t) x ik (t)
i, j,k,h
(28)
40 The Open Cybernetics and Systemics Journal, 2007, Volume 1
ment signal, then it is a form of “feedback reinforcement”
automatically.
For example, if an animal moves toward the food (reward) as a result of the conditioning, it increases the positive
reinforcement signal by its motor response. This would
strengthen the synaptic weight in the neural circuitry, resulting in a further increase in subsequent response size.
This is essentially self-learning, associative learning
without a “teacher”, unsupervised learning or operant conditioning, where the behavior is shaped by self-exploration of
the animal instead of being guided by an outsider.
CONDITIONED FEEDFORWARD REINFORCEMENT
If the reinforcement signal is a feedforward signal in
which the response of the animal does not affect the presentation of the reward or punishment, it is a form of “feedforward
reinforcement”. For example, the Pavlovian conditioning is
an example in which Pavlov [21, 22] presented the food reward independent of the behavioral response of the dog.
This type of feedforward reinforcement corresponds to
classical conditioning (or Pavlovian conditioning), learning
with a “teacher”, or supervised learning. In both case, the
behavior is shaped by the reinforcement signal whether it is a
feedforward or a feedback signal.
DERIVED REINFORCER
When a reinforcement system receives its reinforcer signal from an external source, such as food reward, then it is a
feedforward system in which the reinforcer is directing the
system’s adaptation without relying on any internal feedback
signal for adjustment. But when the system is deriving an
alternate reinforcement signal other than the original reinforcer, such as in a conditioning paradigm, then it becomes a
feedback system since the reinforcer is no longer coming
from an independent external source, but deriving from a
dependent internal source.
CONDITIONED FEAR AS AN EXAMPLE
Conditioned fear is an example where a reinforcer can be
substituted by an alternate source. The original fear response
is triggered by a noxious stimulus. But when it is paired with
another stimulus, the alternate signal acts as a reinforcer triggering the response. For instance, when a shock is paired
with a tone signal, the original reinforcer (US shock) is no
longer needed to elicit the fearful response while the alternate
reinforcer (CS tone) is sufficient to elicit the conditioned fear
response.
CONDITIONING BY BENIGN REINFORCER
Subsequent to this fear conditioning, another CS (such as
pairing the tone with yellow warning light) could be used to
establish an alternate stimulus-response function using the
secondary reinforcer (tone). Thus, the transfer of the reinforcer from the original source (noxious stimulus) to an alternate source (non-noxious stimulus) forms a feedback loop.
Note that, in this case, even though the reinforcer is no
longer noxious (i.e., benign or neutral), the CR response is
still avoidance behavior rather than habituation (or extinc-
David Tam
tion) because the response is associating with the original
previous noxious stimulus (cascaded from the transfer of
reinforcers).
ENVIRONMENT AS THE FEEDBACK LOOP
In most circumstances, the output action is a feedforward
action in which any errors in the output are not directly (or
explicitly) fed back into the system for correction as far as
the learning rule for the neural circuitry is concerned. But
that is not to say that the system does not receive any feedback from its action.
In reality, because of reinforcement, a change in course of
action occurs. As a result, the subsequent behavioral action
alters the environment (such as moving away from the aversive stimulus) in such a way that it also changes the reinforcement intensity (diminishing the intensity of reinforcement signal when it moves away from that source). Thus, a
feedback loop is still maintained even though the loop includes the external environment, not just the autonomous
system itself.
COMBINING AUTONOMOUS BEING AND ITS ENVIRONMENT AS A META-SYSTEM
The above discussion leads to the expansion of the scope
included in the components of a system. Although most of
the time, the autonomous robot (or animal) is considered as a
standalone system with its own self-actuating and selfadaptive components, we may include the environment in
which it operates as a meta-system.
Because the animal (or autonomous robot) is no longer
operating in isolation independent of the environment, the
environment becomes an integral part of the meta-system it
operates in. Every action it takes may have an impact on the
environment, thus the resulting action provides a feedback to
the organism indirectly through the alteration of the environment its exposes to.
Thus, the autonomous system cannot be viewed in isolation without the environment considered as part of the metasystem. In other words, the context of the sensory signals is
only meaningful in relation to how they affect the organism,
and is often meaningless without the organism in place.
ENVIRONMENTAL CONTEXT IN EMOTION FORMATION
By the same token, the emotional context of the sensation
is only meaningful when the environment is present; it is
meaningless without the environment. Thus, it is essential to
include the environment as part of the meta-system when
autonomous control and emotional context are considered
together.
SELF-EXPLORATION
SEARCH
AS
PARAMETER
SPACE
For feedback reinforcement, self-learning can be acquired
by exploration, i.e., variation (randomization) of the response
output. Small variations in the motor output allow the animal
(or autonomous robot) to explore different parameter space in
search for the final I/O function. Thus, probabilistic stimulus-
EMOTION-I Model A Biologically-Based Theoretical Framework
response I/O function (such as Eqs. 5 - 8) can enable the system to “explore” the parameter space, whereas deterministic
I/O function (such as Eqs. 3 - 4) often does not.
SOLUTION WITHOUT A PRIORI KNOWLEDGE
The exploration of the parameter space can be done by an
animal itself (without a “teacher”) to establish the final probabilistic I/O function by incorporating a feedback reinforcement signal together with a stochastic I/O function without
any presumed knowledge of the outcome of the system.
In contrast, exploration of the parameter space for deterministic I/O functions often requires feedforward reinforcement signal provided by the “teacher” who predicts (projects)
what the final outcome (desired goal) would be, with presumed knowledge of the outcome of the system.
ROLE OF PROBABILISTIC FUNCTION IN SELFEXPLORATION
The mathematical mechanism for self-exploration lies in
the probabilistic I/O function. Without the variations in output, the animal (or autonomous robot) will always repeat the
same action unless its action is altered by environmental perturbations or by an external “teacher”.
Environmental perturbations do occur in the real world
due to unforeseen circumstances, or failure of moving parts.
Thus variability in sampling the parameter space can occur in
deterministic system, but it is incidental rather than intrinsic
to the system’s operation.
This is consistent with the fact that most biological systems are variable in their output action rather than rigidly
stereotypical or identical. In fact, variability in output actions
is the hallmark of animals whereas precise output is the hallmark for machine actions.
ESTABLISHING EMOTIONAL CONTEXT INTERNALLY BY REINFORCEMENT LEARNING
The Open Cybernetics and Systemics Journal, 2007, Volume 1
41
The successive transfer of other stimuli that are associated
with the original stimulus can form the framework for establishing the context in which the original stimulus-response
function is operating. In other words, the original sensory
input is no longer interpreted (processed) in isolation; rather
it is processed in reference to the other associated stimuli.
NEURAL CORRELATE OF INTERNAL REWARD
CENTER
Nucleus accumbens is the brain structure in the mesolimbic system that is well known for its behavioral reinforcement property in mammals [23]. Activation of the nucleus
accumbens often produces highly reinforced behavior, especially in reward activation. It is also known that nucleus accumbens is activated by different types of reward signals,
including water [24], food [25], dopaminergic drugs [26, 27]
such as cocaine, and even visual stimuli such as beautiful
faces [28]. Cocaine is known as a powerful internal reinforcer in behavioral activation [29]. Thus, internal loci of
reinforcer exist in animals.
FEEDBACK GAIN BIAS IN ESTABLISHING EMOTIONAL CONTEXT
The above discussion shows that associative reinforcement learning can be one of the mechanisms for establishing
context of a sensation. This reinforcement paradigm is essentially a feedback control system whereby the sensory stimuli
are integrated into the neural processing, not only as sensory
inputs per se, but also as internal multiple-gain feedback control signals. The gain can be positive or negative, which can
be used to automatically set “biases” to the system in such a
way that certain sensory inputs are amplified (or attenuated).
ESTABLISHING EMOTIONAL SIGNIFICANCE BY
INCREASING THE GAIN
The next phase in deriving the emotional context in sensation is to establish the neural mechanisms for the abstraction of emotional feel. Conditioning and associative reinforcement learning is well suited for the mechanism in establishing emotional context associated with the sensory stimulus.
The amplification of these specific sets of inputs signifies
the “importance” of the signal in establishing association,
i.e., correlation among these input signals. That is to say, the
signals are self-selected in the processing such that they form
special “significance” in determining the final output. Conceptually, this process establishes the “context” from the environment by integrating the sensory signals that have significant importance for determining its output.
INTERNAL AFFILIATION OR AVOIDANCE
SELF-SELECTIVE BIAS IN EMOTION FORMATION
Although traditionally, positive or negative reinforcement
leads to behavioral output, i.e., motor response of either seeking or avoidance responses, similar responses can be established internally within the neural system rather than externally interacting with the external world. The internal representation of the affiliation or avoidance behavior prior to the
motor output can be considered as the emotional representation of such behavior.
This “self-selective” process for biasing the system to put
more importance of the selected signal to produce its output
is relied on the feedback information derived from the sensory signals themselves. This also assumes that the initial
starting point is relied on the existence of a presumed innate
reinforcer that sets the direction of behavioral motivation,
i.e., positive reinforcer (such as food) will increase the probability of motor activation for affiliation (seeking-behavior)
while negative reinforcer (such as pain) will tend to increase
the likelihood of motor activation for avoidance motor output. (We will show in subsequent sections how this innate
reinforcer can be established.)
MECHANISMS FOR TRANSFER OF DERIVED REINFORCER
As illustrated before, when a reflex response is conditioned to another stimulus other than the original stimulus,
the transfer of one stimulus to another can be established.
INNATE AND VIRTUAL REINFORCER AS INTERNAL TEACHER
42 The Open Cybernetics and Systemics Journal, 2007, Volume 1
The innate reinforcer essentially serves as the internal
“teacher” that guides the autonomous system’s behavior.
Subsequent acquisition of other signals as the reinforcer can
be established by feedback reinforcement by correlating the
initial innate reinforcer with other potential candidate reinforcers. The new reinforcer established internally can be considered as the virtual reinforcer, as discussed earlier. This
transfer of one reinforcer to another reinforcer can be selfpropagating using the conditioning paradigm.
MECHANISMS FOR ESTABLISHING INNATE REINFORCER
Given that the innate reinforcer can initiate the subsequent conditionings, the question may be raised to address
how this initial innate reinforcer is established in the first
place. In biology, innate properties often refer to in-born,
genetically programmed properties. In engineering, these
properties are often hardwired (or pre-programmed). Then
the question becomes how genetic programming is established about without a designer or a programmer who may
have a priori knowledge of the system or its desired outcome.
ROLE OF FEEDFORWARD IN EVOLUTION
One of the possible solutions to this problem of solving a
problem without even knowing what the problem is lies in
feedforward control. When a system uses a feedforward control for its operation, it does not require feedback to determine whether its actions are appropriate or not. It simply
produces its output based on the feedforward signal independent of the outcome.
FEEDFORWARD AS A PREDICTION
This feedforward control often presumes a (wild guess)
prediction of its output (even though that presumption may or
may not be correct in actuality). Because of this presumption
(predictive property) in feedforward control, it has the advantage of producing an action that may have a “chance” for
success.
Metaphorically speaking, feedforward control allows the
autonomous system to take chances at finding a solution even
though it may be a “shotgun approach”, but that is precisely
the principle underlying the process of evolution.
TRIAL-AND-ERROR APPROACH
As in evolution, feedforward is used as the initial mechanism in the trial-and-error process to explore the parameter
space. “Survival of the fittest” is the second step in the evolutionary process that is feedback in nature. That is, it reinforces (keeps) the trials that work, and minimizes (eliminates) trials that don’t work. Without the feedforward
mechanism, evolution would not proceed.
By the same token, using the feedforward evolutionary
process, by trial-and-error, the innate reinforcer can be established.
RANDOMIZATION IN PARAMETER SPACE SEARCH
Central to this principle of feedforward evolutionary
process is the trial-and-error exploratory process in sampling
David Tam
the parameter space. This trial-and-error process relies on the
variations of the output.
The variations can come from many sources, both internal
and external. External source may include perturbations to
the system, such as mutation by radiation in genetics, or perturbations from the environment in autonomous robots. The
autonomous system usually does not have much influence or
control over these unforeseen external perturbation sources.
ROLE OF INTRINSIC VARIATIONS
In contrast to external perturbations that are beyond the
control of the system, internal variations of its components
can become part of the intrinsic properties of the autonomous
system. Most often, these internal variations may take the
form of probabilistic function or internal noise. The probabilistic output can provide the variations needed in the trial-anderror process for feedforward signal production. Thus, using
the evolutionary approach, the innate reinforcer can be established when it is followed by the “fitness test” for survivalof-the-fittest feedback.
FITNESS TEST FOR SURVIVAL
Using a feedforward control function for exploratory solution, together with feedback control for fitness test, many
of the innate properties found in animals or autonomous system can be formed by successive iterations. It is analogous to
the principles used by “artificial life” or “genetic algorithm”
[30] to explore the parameter space to find solutions to complex problems by random mutation and recombinant of subsolution space.
ESTABLISHING INNATE REINFORCER WITHOUT
A PRIORI KNOWLEDGE
Using these feedforward and feedback computational
mechanisms, the aforementioned innate reflexes and innate
reinforcers can be established without any assumptions in the
design (or a priori knowledge) of the appropriateness of the
outcome of such system. No arbitrary artificial retrospective
assignment of the roles/functions of these reflexes or reinforcers (i.e., what reflexes or reinforcers are used for) is
needed. Their physiological (and psychological) functional
roles are merely emergent properties of the system after extensive computational iterations.
ESTABLISHING EMOTIONAL CONTEXT WITHOUT
A PRIORI ASSUMPTIONS
Similarly, the emotional context in sensation for an
autonomous system can now be derived from first principles
without any a priori knowledge or assumptions about what
emotions are for. That is, it becomes an emergent property of
the autonomous system when it goes through the feedforward
and feedback cycles of iterations, consolidating (and reinforcing) “relevant” signals by the self-selective process for
adjusting (adapting) the internal gains (connection weights
and reinforcers) of the system integrating the input and output for establishing its probabilistic many-to-many I/O mapping functions.
EMOTION-I Model A Biologically-Based Theoretical Framework
EMOTION AS AN EMERGENT PROPERTY
Thus, emotional sensation is an emergent property rather
than a retrospective property that fits an artificial construct in
psychology. Furthermore, emotions in autonomous robots are
not necessarily pre-programmed or add-on entities introduced
into the robot, but rather they are intrinsic parts of the sensory parameter space integrated with its internal processing
for the production of appropriate output actions.
CONDITIONED FEAR AS A CONTEXTUAL EMOTIONAL RESPONSE EXAMPLE
The conditioned reflex forms the background theoretical
basis for the transfer of emotional responses from one type of
stimulus to another. Fear conditioning [31], as illustrated
before, is a classical example in neurophysiology [32]
whereby the original stimulus that elicits the fear response is
transferred to another stimulus, such as light-onset or toneonset when they are paired in the stimulus presentation to
produce the conditioned fear reflex.
More complex response other than reflex can be invoked
by conditioned fear to produce the contextual emotional response. It can be used to illustrate how environmental context
can be incorporated into the emotional response that transfers
from an innate response to a conditioned response.
INTRINSIC FEAR
Fear is one of the innate emotions experienced in animals
to protect themselves from predation and other potential danger that may lead to death or self-destruction. Intrinsic fear is
the in-born fear that is genetically programmed in animals
when encountered with predators or height (fear of falling),
for example.
For instance, when an animal approaches a cave and then
discovered a bear in the cave, then subsequent to this pairing,
the animal will be conditioned to be fearful in approaching a
cave. Thus, the cave becomes the environment context of this
fearful experience.
SENSORY EXPERIENCE AND EMOTIONAL RESPONSE
By the same token, an autonomous robot can acquire such
fearful response when approaching a dark cave (sensory
darkness) if it were conditioned by an aversive stimulus in
the process using the same computational mechanisms. Thus,
contextual meaning of the sensory inputs can be acquired
from the environment in which the sensory experience is
consolidated.
BOOTSTRAPPING REFLEX
Finally, we will address the mechanism for innate response
formation, such as the establishment of reflex action. The
above discussion focuses on the set of proposed mechanisms
for producing the self-adaptability phenomena in autonomous
systems such that emotional context can be established from
the environment. Yet the derivation also depends on the existence of the presumed innate property (hardwired circuitry)
within the system that it relies on for bootstrapping the subsequent associative reinforcement learning mechanisms. We will
The Open Cybernetics and Systemics Journal, 2007, Volume 1
43
propose a mechanism whereby the innate characteristics (instinct) can be established.
MECHANISMS FOR INNATE REFLEX FORMATION
The mechanisms for innate response formation (such as
reflex) depends on two theoretical principles:
1.
the evolution mechanism of trial-and-error and survival-of-the-fittest test (i.e., initial feedforward exploration and subsequent feedback fitness-test in the
meta-system) to sample the solution space; and
2.
the consolidation of the circuitry (hardwiring) once
the likely solution is found from the above evolutionary principle.
PRINCIPLES OF EVOLUTION
The evolutionary principle implicitly requires the inclusion of both autonomous agent and the environment as a
meta-system for the evaluation of the fitness-test in survivability. The active exploration of solution space by using
feedforward approach implies that the nonlinear function,
f () , in all of the above equations be probabilistic function
rather than deterministic. The specific probabilistic function
used for the autonomous system is an implementationspecific (species-specific) issue, which can be used to optimize the system’s performance.
ESTABLISHING INNATE RESPONSE BY FIXATING
THE CIRCUITRY
The consolidation of the internal neural circuitry (i.e.,
hardwiring) is opposite to the principle of learning (connection modifiability or synaptic plasticity) discussed above. In
order to fixate the circuitry once the solution is approached,
the learning-rate, l , can be decreased to zero.
FREEZING THE LEARNING
To freeze learning, the learning-rate needs be a function
of time, l(t) , rather than a constant (pre-assigned as a parameter of the system as in most neural network learning
equations). Thus, this learning-rate can self-adapt in much
the same way as the aforementioned paradigm where the rate
is high initially for exploration, and decreases as the system
arrives at a stable solution. The fitness-test for survival is
used as a criterion for evaluating the stability of the solution.
CRITERION FOR FIXATION
Without a priori knowledge of what the solution is, the
autonomous system is still able to use the stability criterion
for fixating the circuitry. The stability-test can be accomplished by evaluating the system’s response over successive
time-iterations. For instance, if the connection weights do not
change significantly over multiple iterations of time-steps,
the system can be considered as approaching a stable state.
MOVING-AVERAGE AS A STABILITY MEASURE
Many different stability criteria can be used; for illustrative purpose, we will provide one such stability criterion for
evaluation using a moving-average function, w t , of the
total weight-changes of the system at time, t :
44 The Open Cybernetics and Systemics Journal, 2007, Volume 1
David Tam
(
s1
w
w t =
k
ij (t
l(t) = f w t
qt)
q= 0
s
t
(31)
)
(33a)
and from Eq. 16b:
l(t) = l (t) t
(33b)
averaging over s time-increments of t . The length of the
time-increments averaged over is related to the time-scale of
interest, a parameter similar to the length of period to be considered as stationary in any systems analysis.
where f () is a function that can be a simple proportional
linear function or other nonlinear function, depending on
how the system is designed to approach this stability state.
STATIONARY PERIOD
CRITERION FOR FIXATION RULE OF LEARNING
In biology, st can be a short time period that corresponds to physiological time-scale (which spans a stimulusresponse episode), a longer time period (which spans multiple trials of stimulus-response cycles) or even the lifetime of
an animal.
proaches zero, the learning-rate l(t) will approach zero also.
This satisfies the condition for fixation of the circuitry into
hardwire without the ability to be modified.
ITERATIVE SEARCH FOR STABILITY
Eq. 33 shows that, as the moving-average w t ap-
Thus, the final generalized equation for the weightchange is given by:
w ijk (t) = l(t) r x mk (t) y kj (t) x ik (t)
i, j,k,m
In simulations, this parameter s can be derived adaptively using iterative methods, starting with a small number,
and then compute the moving-average with an increasing s
until the system reaches a stability state as established by the
stability criterion.
or
STABILITY CRITERION
which encapsulates both connection weight-change learning
rule and circuitry fixation rule simultaneously.
If this moving-average is approximately a small constant
over time, then the system can be considered as approaching
a stable state, since the average weight-change is small:
w t (32)
where is a small constant. The stability criterion, , is also
an implementation-specific parameter on the limits of fluctuation for the system.
SYSTEM RESTART CRITERION
When the system reaches a steady-state, whether this stable state is a candidate solution for the system depends on the
survivability fitness-test for the system in the evolutionary
process. That is because a system can be stuck at a stable
state (local minimum) that may not correspond to a real
world solution to the problem (global minimum).
If it happens to be an inappropriate (or invalid) solution,
the system would fail to interact with the environment appropriately, and it will be eliminated in the survivability test.
When this happens, the system dies. A new initial condition
will be used to restart the system for another round of evolution.
FIXATION OF CIRCUITRY BY FREEZING THE
LEARNING RATE
Once the system arrives at a stability state, the circuitry
can be fixated into hardwire rather than allowing it to continue to change and modified. In other words, the learning
coefficient, l , described in Eq. 30, representing the weightchange learning rule is no longer a constant, but will approach zero when the system reaches a stable state.
So the learning parameter, l , can now be changed with
time such that l(t) is dependent on the overall weight-change
average:
(
)
w ijk (t) = f w t r x mk (t) y kj (t) x ik (t)
(34a)
i, j,k,m (34b)
CONCLUSION
The above derivations provide the theoretical framework
for establishing the principles for emotional context formation in sensation with minimal assumptions based on evolutionary principles without any a priori knowledge of the environment, the system or what emotion is used for. It uses a
self-bootstrap feedforward approach to establish the innate
responses and reinforcers, and then consolidates the hardwiring of circuitry by the fitness survival-test feedback.
With the innate reflex circuitry established, associated
reinforcement learning is used to transfer the relevant sensory
signals into derived reinforcers to form the significance of
these inputs by adjusting the individual gains (connection
weights, reinforcers and learning rate) of the system.
As a result of the self-selected biases produced by the
multiple adaptive gains, the system can respond to the environment within context that enhances the response selectively in respond to those stimuli.
Thus contextual significance of the environmental conditions forms the emotional context in which the system (or
animal) responds to. The emotional feel of these sensory
stimuli emerges as a result.
SUMMARY
A comprehensive theoretical framework based on an
autonomous control system is introduced in this paper to derive the basic set of principles that encapsulate emotions as
the emergent properties for increasing the chance of survival
in an environment with a minimal set of assumptions.
The theoretical framework does not rely on retrospective
(or introspective) account, experience or artificial construct
of what emotions are for, or what the roles of emotions are. It
EMOTION-I Model A Biologically-Based Theoretical Framework
does not rely on any innate, hardwired or pre-programmed a
priori knowledge of what the system is attempting to accomplish.
The only basic assumption of this autonomous system is
the existence of the sensory-motor I/O processing elements
that forms a neural network. The other assumption is the existence of feedforward and feedback control. No explicit neural network architecture is assumed either; in fact, the neural
network architecture can be self-organized by the feedforward and feedback mechanism to eliminating (or consolidating) connections between neurons by decreasing (or increasing) their connection weights, reinforcers and/or learning
rate.
The reinforcers needed for the associative learning are
also self-selected and self-generated, with the initial innate
reinforcer formed by the feedforward evolutionary process,
and subsequent reinforcers established by feedback reinforcement correlated with the contextual environmental input. The innate responses are derived from self-bootstrap
methods based on evolution principles such that these innate
responses can serve as reinforcing signals without any a priori assumptions on what these reinforcers should be.
Thus, this theoretical framework for contextual emotion
formation is self-organizing and self-selecting within an
autonomous control system without any external “teacher”,
without any presumed a priori knowledge of the environment, or what the autonomous system is expected to behave
or “feel”.
The sensory input, the output actions and the internal representation of the external environment are encapsulated by
the implicit model, which is created by the neural network
processing circuitry. The emotional context of the sensorimotor actions (including reflex actions, conditioned reflexes,
and conditioned emotional responses) is the emergent property of this self-organizing neural network.
This represents the first phase of an emotional model
called the “EMOTION-I” model, which focuses on the minimal sets of principles for establishing emotional context in
sensation. The next phase of the “EMOTION-II” model presented in the next paper [5] will establish the “internal
model” needed for the representation of the external world
for an autonomous control system to function, and provide
the derivation of principles for the emergence of
happy/unhappy emotions for self-assessment and consistency-check when comparing the discrepancy between the
expectancy in objective and subject realities (not just sensation, as in EMOTION-I).
REFERENCES
[1]
[2]
[3]
S. C. Gadanho, "Learning Behavior-Selection by Emotions and
Cognition in a Multi-Goal Robot Task," J. Mach. Learn. Res., vol.
4, pp. 385-412, Dec 2003.
S. C. Gadanho and J. Hallam, "Emotion-triggered learning in
autonomous robot control," Cybernetics and Systems, vol. 32, pp.
531-559, Jul 2001.
A. Takanishi, K. Sato, K. Segawa, H. Takanobu, and H. Miwa, "An
Anthropomorphic Head-Eye Robot Expressing Emotions Based on
Equations of Emotion," Proc. IEEE Int. Conf. Robot. Automat., vol.
3, pp. 2243-2249, Nov-Dec 2000.
The Open Cybernetics and Systemics Journal, 2007, Volume 1
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
45
M. A. Arbib, "Evolving emotions in animal and robot," Int. J. Computat. Intel. Applicat., vol. 4, pp. 225-236, Sept, 2004.
D. Tam, "EMOTION-II Model: A Theoretical Framework for
Happy Emotion as a Self-Assessment Measure Indicating the Degree-of-Fit (Congruency) between the Expectancy in Subjective and
Objective Realities in Autonomous Control Systems," The Open
Cybernetics & Systemics Journal, vol. 1, pp. 47-60, Dec 2007. [Online] Available: http://www.bentham.org/open/tocsj/.
V. Castellucci, H. Pinsker, I. Kupfermann, and E. R. Kandel, "Neuronal mechanisms of habituation and dishabituation of the gillwithdrawal reflex in Aplysia," Science, vol. 167, pp. 1745-8, Mar
1970.
I. Kupfermann, V. Castellucci, H. Pinsker, and E. Kandel, "Neuronal correlates of habituation and dishabituation of the gillwithdrawal reflex in Aplysia," Science, vol. 167, pp. 1743-5, Mar
1970.
H. Pinsker, I. Kupfermann, V. Castellucci, and E. Kandel, "Habituation and dishabituation of the gill-withdrawal reflex in Aplysia,"
Science, vol. 167, pp. 1740-2, Mar 1970.
M M. B. Arnold, "Emotion, motivation, and the limbic system,"
Ann. N. Y. Acad. Sci., vol. 159, pp. 1041-1058, Jul 1969.
D. A. McCormick, D. G. Lavond, and R. F. Thompson, "Neuronal
responses of the rabbit brainstem during performance of the classically conditioned nictitating membrane (NM)/eyelid response,"
Brain Res., vol. 271, pp. 73-88, Jul 1983.
A. G. Barto and R. S. Sutton, "Landmark learning: an illustration of
associative search," Biol. Cybern., vol. 42, pp. 1-8, Nov, 1981.
A. G. Barto, C. W. Anderson, and R. S. Sutton, "Synthesis of nonlinear control surfaces by a layered associative search network,"
Biol. Cybern., vol. 43, pp. 175-185, Apr, 1982.
R. S. Sutton and A. G. Barto, "Toward a modern theory of adaptive
networks: expectation and prediction," Psychol. Rev., vol. 88, pp.
135-70, Mar 1981.
M. L. Commons, S. Grossberg, and J. E. R. Staddon, “Neural network models of conditioning and action”, Quantitative analyses of
behavior (Unnumbered), Hillsdale, N.J.: L. Erlbaum Associates,
1991.
J. A. Anderson, An introduction to neural networks. Cambridge,
Mass.: MIT Press, 1995.
P. R. Montague and T. J. Sejnowski, "The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms," Learning & memory (Cold Spring Harbor, N.Y.), vol. 1,
May-Jun 1994
D. O. Hebb, The organization of behavior; a neuropsychological
theory. New York: Wiley, 1949.
D. E. Rumelhart, J. McClelland, and P. R. Group, Parallel distributed processing explorations in the microstructure of cognition.
Volume I, Foundations. Cambridge, Ma.; London: MIT Press, 1986.
D. C. Tam, “Computation of cross-correlation function by a timedelayed neural network”, in Intelligent Engineering Systems
through Artificial Neural Networks, C. H. Dagli, L. I. Burke, B. R.
Fernández, J. Ghosh, Eds., American Society of Mechanical Engineers Press, New York, NY, vol. 3, pp. 51-55, 1993.
D. Tam, "Theoretical analysis of cross-correlation of time-series
signals computed by a time-delayed Hebbian associative learning
neural network," The Open Cybernetics & Systemics Journal, vol.
1, pp. 1-4, Jul 2007. [Online] Available: http://www.bentham.org/
open/tocsj/
I. P. Pavlov, "Conditioned reflex," Feldsher Akush, vol. 10, pp. 310, Oct 1951.
I. P. Pavlov, "Conditioned reflex," Feldsher Akush, vol. 11, pp. 612, Nov1951.
J. M. van Rossum, C. L. Broekkamp, and A. J. Pijnenburg, "Behavioral correlates of dopaminergic function in the nucleus accumbens," Adv. Biochem. Psychopharmacol., vol. 16, pp. 201-207,
1977.
D. J. Woodward, J.-Y. Chang, P. Janak, A. Azarov, and K. Anstrom, "Part I. Functional Organization of the Ventral Striatopallidal
System - Mesolimbic Neuronal Activity across Behavioral States,"
Ann. N. Y. Acad. Sci., vol. 877, p. 91, Jun 1999.
B. G. Hoebel, "Brain neurotransmitters in food and drug reward,"
Am. J. Clin. Nutr., vol. 42, pp. 1133-50, Nov 1985.
R. A. Wise, "The role of reward pathways in the development of
drug dependence," Pharmacol. Ther., vol. 35, pp. 227-263, 1987.
46 The Open Cybernetics and Systemics Journal, 2007, Volume 1
[27]
[28]
[29]
[30]
[31]
R. A. Wise and M. A. Bozarth, "Brain mechanisms of drug reward
and euphoria," Psychiatr. Med., vol. 3, pp. 445-460, 1985.
I. Aharon, N. Etcoff, D. Ariely, C. F. Chabris, E. O'Connor, H. C.
Breiter. “Beautiful faces have variable reward value: fMRI and behavioral evidence”, Neuron, vol. 32, pp. 537-551, Nov 2001.
R. A. Wise, "Neural mechanisms of the reinforcing action of cocaine," NIDA Res. Monogr., vol. 50, pp. 15-33, 1984.
C. G. Langton, “Artificial life”, Addison-Wesley, Redwood City,
Calif., 1987.
R. G. Phillips and J. E. LeDoux, "Differential contribution of
amygdala and hippocampus to cued and contextual fear conditioning," Behav. Neurosci., vol. 106, pp. 274-85, Apr 1992.
Received: November 5, 2007
David Tam
[32]
[33]
[34]
J. LeDoux, "The emotional brain, fear, and the amygdala," Cell.
Mol. Neurobiol., vol. 23, pp. 4-5, Oct 2003.
S. C. Gadanho, J. Hallam, “The role of emotions exploring autonomy mechanisms in mobile robots”, D.A.I. research paper, no. 851.
Edinburgh: University of Edinburgh, Dept. of Artificial Intelligence, 1997.
S. C. Gadanho, J. Hallam, “Emotion-driven learning for animat
control”, D.A.I. research paper, no. 881. Edinburgh: University of
Edinburgh, Dept. of Artificial Intelligence, 1998.
Revised: November 19, 2007
Accepted: December 10, 2007