Download Physiologically-Inspired Model for the Visual Tuning Properties of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonsynaptic plasticity wikipedia , lookup

Neuroeconomics wikipedia , lookup

Neuroethology wikipedia , lookup

Molecular neuroscience wikipedia , lookup

Single-unit recording wikipedia , lookup

Neural engineering wikipedia , lookup

Artificial neural network wikipedia , lookup

Multielectrode array wikipedia , lookup

Stimulus (physiology) wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Caridoid escape reaction wikipedia , lookup

Holonomic brain theory wikipedia , lookup

Time perception wikipedia , lookup

Neuroesthetics wikipedia , lookup

Neural oscillation wikipedia , lookup

Binding problem wikipedia , lookup

Recurrent neural network wikipedia , lookup

Neuroanatomy wikipedia , lookup

Circumventricular organs wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Neural modeling fields wikipedia , lookup

Neural correlates of consciousness wikipedia , lookup

Embodied language processing wikipedia , lookup

Convolutional neural network wikipedia , lookup

Central pattern generator wikipedia , lookup

Biological neuron model wikipedia , lookup

Development of the nervous system wikipedia , lookup

Metastability in the brain wikipedia , lookup

Neural coding wikipedia , lookup

Pre-Bötzinger complex wikipedia , lookup

Optogenetics wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Premovement neuronal activity wikipedia , lookup

Synaptic gating wikipedia , lookup

Mirror neuron wikipedia , lookup

Neural binding wikipedia , lookup

Efficient coding hypothesis wikipedia , lookup

Nervous system network models wikipedia , lookup

Channelrhodopsin wikipedia , lookup

Feature detection (nervous system) wikipedia , lookup

Transcript
Proceedings of the 2008 International Conference on Cognitive Systems
University of Karlsruhe, Karlsruhe, Germany, April 2-4, 2008
Physiologically-inspired model for the visual tuning properties of mirror
neurons
Falk Fleischer, Antonino Casile, Martin A. Giese
has been explained by dynamic predictive motor models that
simulate the action in synchrony with the visual stimulus
[6], [7], [8], [9], [10], [11], [3]. However, the reconstruction
of three-dimensional structure, in particular from monocular
image sequences, is a difficult computational problem. A
large body of results on the recognition of static shapes
suggests that the visual system might not reconstruct the full
3D structure of recognized objects. Instead, it seems to base
recognition on an integration of information extracted from
two-dimensional views of objects [12], [13], [14], [15]. This
raises the question whether the visual tuning properties of
mirror neurons can be explained within a similar framework,
without an accurate reconstruction of the three-dimensional
scene geometry.
The aims of this paper are twofold: First, we try to
develop a model for the visual tuning properties of mirror
neurons that is physiologically plausible, and which, at a
later stage can be compared to electrophysiological data.
This makes it necessary that the model operates on real
video sequences which also can be presented to monkeys in
electrophysiological experiments. Second, we try to devise
a model that explains visual tuning properties of mirror
neurons without the need of the 3D reconstruction of effector
and object geometry, in order to test the computational
feasibility of the recognition of goal-directed actions within
a view-based framework.
In the following, we first present the model and its
components. We then show some example simulations that
reproduce typical properties of mirror neurons. Finally, we
discuss implications and further extensions of this work.
Abstract— Mirror neurons are a class of neurons that have
been found on the premotor cortex of monkeys, and which are
active during the motor planning and the visual observation of
actions. These neurons have recently received a vast amount
of interest in cognitive neuroscience and robotics and have
been discussed as potential basis for the imitation learning
and understanding of actions. However, their visual tuning
properties are only poorly understood. Most existing models
assume that the tuning properties of mirror neurons might be
based on a reconstruction of the three-dimensional structure of
action and object, a computationally difficult problem. In line
with a broad body of work on object recognition, we present
a model that explains visual properties of mirror neurons
without this requirement. The proposed model is based on
a small number of physiologically well-established principles.
In addition, it postulates novel neural mechanisms for the
integration of information about object and effector movement,
which can be tested in electrophysiological experiments.
I. INTRODUCTION
Mirror neurons are a class of neurons that have been first
described in the premotor cortex of monkeys. These neurons
respond as well when the animal prepares motor actions, as
when it perceives motor actions executed by other monkeys
or humans [1]. Recently, mirror neurons have received a
vast amount of interest in cognitive neuroscience, and also
in robotics, since they have been discussed as physiological
basis of the imitation learning of actions and potentially also
of action understanding [2], [3]. Beyond the fact that they are
active during motor planning, mirror neurons have a number
of interesting visual tuning properties. They are selective for
subtle differences between actions, like power vs. precision
grip. At the same time, the are highly invariant against the
position of the action in the visual field and partially even
against the view of the action. However, their response is
critically dependent on the correct spatial arrangement of the
effector and the goal object, and usually they respond only
to functionally effective actions. In addition, mirror neurons
typically fail to respond to mimicked actions without goal
object [4], [5].
Many existing models for the mirror neuron system assume a reconstruction of the three-dimensional structure
of goal object and effector motion by the visual system.
Recognition within such three-dimensional representations
II. MODEL
The developed model is based on principles that have
been successfully applied before for the modeling of object
recognition [16], [17], [18], [19] and movement recognition
[20]. An overview of the model architecture is shown in
Figure 1. In the following, the individual components and
principles will be described in more detail.
The architecture is based on three main components: (1)
A hierarchical neural model for the recognition of goal
objects and effector (hand) shapes from video frames, where
the middle levels of this hierarchical model are optimized
by feature learning; (2) a simple recurrent neural circuit
for the realization of temporal sequence selectivity for the
recognition of effector movements, and (3) a physiologically
plausible mechanism that combines the spatial information
about the goal object and the posture, position and orientation
This work was supported by the DFG, HFSP, EC FP6 project COBOL
and the Hermann Lilly Schilling Stiftung.
All
authors
are
with
the
Hertie
Institute
for
Clinical
Brain
Research,
Tübingen,
Germany
[email protected],
{antonino.casile,martin.giese}@uni-tuebingen.de
M. Giese is also with the School of Psychology, University of Bangor,
UK
19
of the effector. The highest level of this mechanism is formed
by the model ’mirror neurons’.
A. Hierarchical recognition network for objects and effector
shapes
The recognition of the shapes of the goal object and the
effector (hand) is based on a hierarchical neural recognition
model. Very similar models have been proposed to account
for variety of experimental results in object recognition [17]
and motion recognition [20]. Each video frame is analyzed
by a hierarchy of neural feature detectors. The complexity
of the extracted features increases along the hierarchy. At
the same time, also the size of the receptive fields and the
invariance of the detectors against position and scale changes
increase along the hierarchy. The individual computational
steps are further lined out below, and we refer to [19] and
[21] for further details. An overview of the model neurons
is given in Table I.
1) Local orientation filters (areas V1/V2): Local orientations are extracted from the video frames using Gabor filters
with 12 orientations and 3 different scales that are selective
for different spatial positions. Signifying by Gk (u, v) a
normalized Gabor filter with zero mean and sum of squares
1, the response xk of a Gabor filter Gk to a patch of pixels
P (u, v) from the input image is given by:
< P, Gk > √
(1)
xk = β + < P, P > ‘V1’
‘V2/V4’
‘IT’
…
‘IT/STS’
‘STS’
Relative position map
‘Precision
grip’
‘AIP’
In this expression
the scalar product is defined as
P
< f, g >= u,v f (u, v)g ∗ (u, v), and the positive constant
β avoids division by zero. Consistent with [21], we assume
a mutual inhibition between the filters with the same spatial
position and scale, but with different orientations: Let xmin
and xmax be the minimum and the maximum of the responses
over all orientations. Then a local threshold was defined by
the expression η = xmin + b(xmax − xmin ), where b is a
positive constant that defines the strength of the inhibition.
The effective output of the orientation filters was given by
x˜k = [[xk − η]+ − T ]+ , where [x]+ = max(x, 0) and T > 0
is a global threshold.
Responses of these orientation filters from the first layer
of the model, with selectivity for the same orientation and
scale within a limited spatial neighborhood, were pooled by
a maximum operation in order to achieve partial position
invariance [22], [17]. Model neurons that realize this pooling
are similar to complex cells in primary visual cortex that
are characterized by a limited degree of position and scale
invariance.
2) Detectors for intermediate form features (area V4):
The output signals from the position-invariant orientation
detectors in the previous layer within a limited spatial region
were used to construct form features with increased complexity. The selectivity of the detectors for such intermediate
level features was established by learning (see below). Such
intermediate feature correspond, for example to fragments of
hands or objects (Figure 1). The tuning properties of these
‘PF, F5’
‘Precision
grip’
Fig. 1.
Overview of the model.
detectors is given by radial basis functions of the form
!
2
kY − Um kF
(2)
ym = exp −
2σ 2 Nm
where the matrix Y signifies the responses from a patch
(of the position grid) from the previous layer, and where
the matrix Um is a template pattern that is established by
learning. The integer Nm signifies the number of non-zero
elements in the matrix Um .
Detectors for each particular feature, defined by Um , are
replicated for different spatial positions. Like for the orientation detectors, the responses of the detectors with same
feature selectivity but different position selectivity within a
spatial neighborhood are pooled using a maximum operation.
This defines a hierarchy layer with model neurons that detect
optimized mid-level features with partial invariance against
position changes.
3) Detectors for complete object forms and hand shapes
(area IT): The next hierarchy layer implements exactly
the same computational functions as the layers described
20
TABLE I
PARAMETERS OF THE HIERARCHICAL MODEL FOR OBJECT AND
3) Compute the responses ym of all existing mid-level
feature detectors according to Equation (2).
4) Only if the maximum of these responses is below a
given threshold (TL > 0.95) add a new template to
the already existing features.
5) Repeat steps from 2) until a sufficient number of
features have been learned.
This algorithm implements a form of competitive online
feature learning, which potentially also could be implemented by an appropriate online learning rule that recruits
additional neurons for features that are not yet sufficiently
well represented.
EFFECTOR RECOGNITION
layer
1
2
3
4
5
6
# filter types
36
36
226
226
17
17
receptive field size (deg)
0.63–1.09
0.74–1.20
2.12–2.57
2.29–2.75
> 4.0
> 4.0
total # of neurons
2332800
259200
1627200
101700
7650
850
in the last section. In this case, the receptive field sizes
are large enough to encompass whole objects and effector
configurations. The resulting feature detectors respond selectively to views of objects and hands, being sensitive to
size and orientation. This selectivity provides information
that is critical to determine whether a grip is functional or
dysfunctional for a particular object. The object and hand
detectors were again modeled by radial basis functions which
were optimized by training of Support Vector Machines [23]
that classify one pattern (object or hand view) against all
others. This step is not physiologically plausible and will be
replaced later by physiologically-inspired learning rules.
Again detectors with the same selectivity were realized for
different spatial positions and their responses were pooled
with a maximum operation within a spatial neighborhood to
achieve position invariance. However, contrasting with many
object recognition models, the neurons on the highest hierarchy level of our recognition hierarchy have still a coarse
tuning for the positions of the object and the effector. This
is consistent with neurophysiological data [13]. In addition,
this property is necessary for extracting the relative positions
of effector and object, which is crucial for distinguishing
functional and dysfunctional actions.
C. Temporal sequence selectivity (area STS)
To recognize effector movements it is not sufficient to
detect individual keyframes. Only if these keyframes arise in
the correct temporal order the action should be recognized.
To implement sequence-selective recognition we exploited a
recurrent neural mechanisms that has been proposed before
in the context of biologically inspired models for movement
recognition [20].
We assume that the outputs of the neural detectors for
individual effector shapes for a specific action l, signified
by zkl (t), provide input to snapshot neurons that encode the
temporal order of individual effector shapes. Selectivity for
temporal order is achieved by introducing asymmetric lateral
connections between these neurons. The dynamics of the
resulting network is given by the equation
τr ṙkl (t)
= −rkl (t) +
(3)
!
X
l
w(k − m) [rm
(t)]+ + zkl (t) − hr
m
where hr is a parameter that determines the resting level,
and where the parameter τr determines the time constant of
the dynamics. The function w is an asymmetric interaction
kernel that, in principle, can be learned efficiently by timedependent Hebbian learning [24].
The responses of all snapshot neurons that encode the
same action pattern are integrated by motion pattern neurons,
which smooth the activity over time. Their response depends
on the maximum of the activities rkl (t) of the corresponding
snapshot neurons:
B. Learning of mid-level features
The templates Um on the middle levels of the hierarchy
are established by learning. As training data set, we used
images that contain the relevant object or effector view.
The first step of the learning process is a selection of the
most strongly activated features, passed on from the previous
hierarchy level, for each spatial position of the given training
image. The result is a single dominant feature per position.
This selection procedure could be implemented neurally by
a winner-takes-all inhibition between the neurons encoding
different features at the same position.
New templates defining novel intermediate features are
created using the following iterative procedure that makes the
creation of novel templates dependent on the performance of
the existing ones for the given training image:
τs ṡl (t) = −sl (t) + max [rkl (t)]+ − hs
k
(4)
The motion pattern neurons become active during specific
movement sequences, e.g. for grasping with a precision or
power grip. Such neurons have been found in the superior
temporal sulcus of monkey [25].
D. Mirror neurons: Integration of information from object
and effector (areas AIP, PF, F5)
1) Create an initial 2D template feature Uk , which is
derived from the response of the previous hierarchy
level for a randomly chosen region and a random
training example.
2) Get a new input patch X from a random example
image.
The highest levels of the model integrate the following
signals about object and effector: (1) Type of the dynamic
effector action that is signalled by the motion pattern neurons, and (2) the spatial relationship between the moving
effector and the goal object. The necessary information about
21
the positions of the object and the effector is extracted from
the highest level of the form-hierarchy, which is not completely position-invariant and thus encodes these positions
coarsely within a retinal frame of reference. In addition, the
recognized effector view predicts a range of object positions
that are suitable for an effective grip. This permits to derive
whether the effector action likely will be successful or not,
dependent on the object position.
We postulate a simple physiological mechanism for the
integration of these different pieces of information that is
centrally based on a relative position map which is constructed by pooling the output signals from the neurons
encoding effector and object views. By pooling the signals
of all neurons that represent views of objects at similar
spatial positions one can derive a population vector that
provides a coarse estimate of the object position [26], [27].
More precisely, by pooling the activity of all object view
neurons that represent objects close to the position (u, v)
in the retinal frame of reference, one can derive a field of
population activity aO (u, v) that has a peak at the position
(uO , vO ) of the object. In the same way, one can derive an
activity field aE (u, v) that has a peak at the position (uE , vE )
of the effector by pooling the signals of all effector-view
neurons that have been trained with similar effector positions.
From these two activity fields a relative position map is
derived, which encodes the position of the object relative
to the effector. For this purpose one defines an activity
map that integrates the pooled signals defined before in a
multiplicative manner:
Z
aRP (u, v) = aO (u′ , v ′ ) aE (u′ − u, v ′ − v) du′ dv ′ (5)
function in the form:
Z
l
a = aRP (u, v) gl (u, v) du dv
(6)
The output of the affordance neurons is only positive if object
and effector are present and positioned correctly relative to
each other. Neurons that are tuned to the relationship between
objects and grips have been found in the parietal cortex of
monkeys, e.g. in area AIP [28], and imaging studies suggest
that such areas might be activated also by purely visual
stimulation [29].
The highest level of the model is given by mirror neurons that multiply the output of the motion pattern neurons with that of the corresponding affordance neurons:
ml (t) = al (t) ∗ sl (t). By the multiplicative interactions the
mirror neurons only respond when the appropriate action is
present, and when the goal objects is appropriately positioned
relative to the effector.
III. RESULTS
A. Testing procedure
The model was tested with real video sequences showing
a human actor grasping an object with power and precision
grips. The videos (640x480 pixels, 30 frames/sec, RGB
color mode) were recorded in front of a standardized black
background. The object was a simple ball (diameter 8 cm).
The hand of the actor started from a resting position on a
table at a distance of about 30 cm next to the the object. The
recorded video sequences had a length between 34 and 54
frames.
From the original video a subregion of 360x180 pixels was
extracted that contains the whole effector movement and the
goal object. In addition, images with a size of 120x120 pixels
were extracted that contain only the hand or the goal object
for the training of the recognition hierarchy. The background
was subtracted using a threshold operation, and images were
converted to grayscale for further processing.
This neural map simply sums the products of the activities
of the two neural populations, dependent on the relative
positions between effector and goal objects that are encoded
by the neurons that are selective for object and effector views.
The last equation can be spatially discretized and can be
implemented by summation (pooling) and multiplication of
the signals of the appropriately chosen neurons. Due to the
multiplication, all neurons of the relative position map will
be inactive if either no object or no effector is present in the
visual stimulus.
The recognized view of the effector provides also information about the range of positions in which an object
has to be positioned to permit effective grasping with grip
type l. Specifically, the grip will be dysfunctional if the
objects is, for example, positioned next to the hand as
opposed to between the thumb and the index finger for a
precision grip. It is easy to learn the spatial region within the
relative position map that corresponds to functional grips.
This region is indicated by an orange line in the relative
position map in Figure 1. One can define a ’receptive field’
function gl (u, v) that corresponds to this region, which has
high values within the region and values close to zero
outside. We postulate the existence of affordance neurons
whose response is constructed by computing the output of
the relative position map weighted by this receptive field
B. Discrimination between power and precision grip
After training, the model was tested with video sequences
of a precision and a power grip. The responses of two mirror
neurons at the highest hierarchy level of the model are shown
in Figure 2. The left panel shows the responses of a model
mirror neuron that had been trained with a power grip, and
which was tested with a video sequence of a power and a
precision grip. Since the initial phases of both grip types is
very similar the neuron becomes activated initially by both
grips. However, after some time the preshaping of the hand
leads to different hand configurations for both grips, resulting
in a decay of the response of the neuron for the precision
grip, while the response continues to increase strongly for the
power grip. The right panel shows the equivalent behavior for
a mirror neuron that had been trained with a precision grip.
Initially the neuron responds for both grip types, but after
a while the response for a powergrip stimulus breaks down,
while the response for a precision grip stimulus continues
to increase. The simulated mirror neurons in the model are
thus selective for the grip type, even in presence of objects.
22
Mirror neuron selective for precision grip
Mirror neuron selective for power grip
experiments. In spite of this simplicity, the model works
on real video sequences. In its present elementary form, the
model reproduces qualitatively a number of key properties of
mirror neurons: (1) tuning for the subtle differences between
grips; (2) failure to respond to mimicked actions without goal
object, and (3) tuning to the temporal order of the presented
action. All these properties were reproduced without an
explicit reconstruction of the 3D geometry of the effector
or the goal object. It seems thus that at least some of the
visual tuning properties of mirror neurons can be reproduced
without a precise metric three-dimensional internal model.
Due to the embedded mechanism for sequence selectivity
the proposed model is predictive and can, in principle,
account for psychophysical results that show a facilitation
of the recognition of future effector configurations from
previously observed partial actions [30], [31]. Differing from
several other models, which assume prediction in a highdimensional space of motor patterns [3], our model assumes
the existence of prediction also in the domain of visual
patterns.
The current version does not include a memory mechanism
that would allow to code the presence and location of
occluded objects [32], [33]. However, it has been shown
that neural networks similar to the one we used to realize
sequence selectivity can also model memory activity [34].
Including such mechanisms, e.g. in the object recognition
pathway, would allow to model the persistent iring of mirror
neurons during occlusions of the goal object [35].
In general, the question arises how a predictive visual
representations interacts with representations for motor patterns, which almost certainly reflect the 3D structure of the
planned actions. At the level of mirror neurons, however to
our knowledge, no conclusive data exists that would allow to
decide if such neurons represent the 3D structure of motor
actions, the 2D structure of learned pattern sequences, or a
more abstract, potentially even non-metric representation of
actions. Recordings in our own lab show that the responses
of a large fraction of mirror neurons in area F5 are viewdependent. This seems to contradict an invariant effectorcentered representation as fundamental coordinate frame of
the operation of mirror neurons. Future detailed electrophysiological in close interaction with quantitative theoretical
approaches will help to clarify how different frames of
reference are represented at the level of mirror neurons.
One might ask the question whether our model is suitable
for robotics applications. Most existing robot systems for
the imitation learning of movements by visual observation
use traditional computer-vision front ends [36]. However,
the fact that models strongly inspired by neuroscience have
achieved performance levels in object detection [19] and
action recognition [37] make their application in the context
of robotics a feasible alternative. From the viewpoint of
computational neuroscience, such applications are an ideal
testbed for verifying the computational power of different
neural implementations of computational operations. To link
the existing architecture to a real robotic system it needs to
be augmented by a learned transformation from 2D into 3D
0.7
0.6
0.6
0.5
0.5
Response
Response
Power grip
Precision grip
0.4
0.3
Power grip
Precision grip
0.2
0.3
0.2
0.1
0.1
0
0.4
−1000
−500
Time [ms]
0
0
−1000
−500
Time [ms]
0
Fig. 2. Responses of two mirror neurons that are selective for a power
grip (left) and a precision grip (right).
Fig. 3. Response of a mirror neuron during a) a normal movement towards
the object, b) a normal movement not towards the object (’mimicked’), c)
a movement in reversed temporal order.
Two additional properties that are typical for real mirror
neurons are illustrated in Figure 3. The solid line indicates
again the behavior of a mirror neuron that has been trained
with a power grip in a normal situation where the movement
is presented with the correct temporal order of the frames
in presence of a goal object. In this situation, the mirror
neuron shows a strong response. If however the temporal
order of the effector movement is reversed (red line), or if the
goal object is not present at the correct position (green line),
i.e. a mimicked action, the activity breaks down. The model
neuron is thus selective for the correct temporal sequence of
the action and requires the presence of a goal object.
IV. CONCLUSIONS AND FUTURE WORK
A. Conclusions
We have presented a neurophysiologically plausible model
for the visual tuning properties of mirror neurons. The
proposed architecture provides only a first step towards a
more detailed modeling of physiological data. However, the
model is based on a number of simple neural mechanisms
that, in principle, can be validated in electrophysiological
23
joint coordinates. In computer vision a variety of algorithms
have been proposed that solve this problem (see [38] for
a review). The major bottleneck for online applications is,
however, the high computation time, in particular on the
earlier levels of the recognition hierarchy. This might be
overcome by implementing parts of the model on a graphics
processing unit [39].
[16] D. Perrett and M. Oram, “Neurophysiology of shape processing,” IVC,
vol. 11, no. 6, pp. 317–333, 1993.
[17] M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in visual cortex,” Nat. Neurosci., vol. 2, pp. 1019–1025, 1999.
[18] B. W. Mel and J. W. Fiser, “Minimizing binding errors using learned
conjunctive features,” Neural Comput., vol. 12, pp. 731–762, 2000.
[19] T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, “Robust
object recognition with cortex-like mechanisms.” IEEE Trans Pattern
Anal Mach Intell, vol. 29, pp. 411–426, 2007.
[20] M. A. Giese and T. Poggio, “Neural mechanisms for the recognition of
biological movements,” Nat Rev Neurosci, vol. 4, pp. 179–192, 2003.
[21] J. Mutch and D. G. Lowe, “Multiclass object recognition with sparse,
localized features,” in Proc. IEEE Conf. on Comp. Vision and Pattern
Recogn. 2006, vol. 1, 2006, pp. 11–18.
[22] K. Fukushima, “Neocognitron: A self-organizing neural network
model for a mechanism of pattern recognition unaffected by shift in
position,” Biological Cybernetics, vol. 36, pp. 193–202, 1980.
[23] V. Vapnik, Statistical Learning Theory. Wiley, 1998.
[24] J. Jastorff, Z. Kourtzi, and M. A. Giese, “Learning to discriminate
complex movements: Biological versus artificial trajectories,” J. Vis.,
vol. 6, pp. 791–804, 2006.
[25] D. Perrett, M. Harries, R. Bevan, S. Thomas, P. Benson, A. Mistlin,
A. Chitty, H. JK, and J. Ortega, “Frameworks of analysis for the
neural representation of animate objects and actions,” Journal of
Experimental Biology, vol. 146, pp. 87–113, 1989.
[26] A. Georgopoulos, J. Kalaska, R. Caminiti, and J. Massey, “On the
relations between the direction of two-dimensional arm movements
and cell discharge in primate motor cortex,” J. Neurosci., vol. 12, pp.
1527–1537, 1982.
[27] A. Pouget, P. Dayan, and R. Zemel, “Information processing with
population codes.” Nat Rev Neurosci, vol. 1, pp. 125–132, 2000.
[28] A. Murata, V. Gallese, G. Luppino, M. Kaseda, and H. Sakata,
“Selectivity for the shape, size, and orientation of objects for grasping
in neurons of monkey parietal area aip,” Journal of Neurophysiology,
vol. 83, pp. 2580–2601, 2000.
[29] C. S. Konen and S. Kastner, “Two hierarchically organized neural
systems for object information in human visual cortex.” Nature Neuroscience, vol. 11, pp. 224–231, 2008.
[30] K. Verfaillie and A. Daems, “Predicting point-light actions in realtime,” Visual Cognition, vol. 9, pp. 217–232, 2002.
[31] M. Graf, B. Reitzner, C. Corves, A. Casile, M. Giese, and W. Prinz,
“Predicting point-light actions in real-time,” Journal of Neurophysiology, vol. 36 Suppl 2, pp. T22–T32, 2007.
[32] M. Umilta, E. Kohler, V. Gallese, L. Fogassi, L. Fadiga, C. Keysers,
and G. Rizzolatti, “I know what you are doing - a neurophysiological
study,” Neuron, vol. 31, pp. 155–165, 2001.
[33] C. Baker, C. Keysers, T. Jellema, B. Wicker, and D. Perrett, “Neuronal
representation of disappearing and hidden objects in temporal cortex
of the macaque,” Exp. Brain Research, vol. 140, pp. 375–381, 2001.
[34] A. Compte, N. Brunel, P. S. Goldman-Rakic, and X. J. Wang, “Synaptic mechanisms and network dynamics underlying spatial working
memory in a cortical network model.” Cerebral Cortex, vol. 10, no. 9,
pp. 910–923, September 2000.
[35] J. Bonaiuto, E. Rosta, and M. Arbib, “Extending the mirror neuron
system model, i. audible actions and invisible grasps.” Biological
Cybernetics, vol. 96, pp. 9–38, 2007.
[36] Y. Demiris and A. Billard, “Special issue on robot learning by
observation, demonstration, and imitation,” IEEE Transactions on
Systems, Man, and Cybernetics, Part B, vol. 37, pp. 254–255, 2007.
[37] H. Jhuang, T. Serre, L. Wolf, and T. Poggio, “A biologically inspired
system for action recognition,” in Proc. IEEE Conf. on Comp. Vision
(ICCV) 2007, 2007, pp. 1–8.
[38] V. Lepetit and P. Fua, “Monocular model-based 3d tracking of rigid
objects: A survey,” Foundations and Trends in Computer Graphics and
Vision, vol. 1, pp. 1–89, 2005.
[39] M. Rumpf and R. Strzodka, “Graphics processor units: New prospects
for parallel computing,” in Numerical Solution of Partial Differential
Equations on Parallel Computers, A. M. Bruaset and A. Tveito, Eds.
Springer, 2005, vol. 51, pp. 89–134.
B. Future Work
Future work will focus on refining the individual components of the model and fitting it in detail to available electrophysiological, behavioral and imaging data. In addition,
specific electrophysiological experiments will be devised that
test directly individual postulated neural mechanisms at the
level of mirror neurons in premotor cortex (area F5).
V. ACKNOWLEDGMENTS
We thank L. Omlor for help with the video recordings.
This work was supported by DFG (SFB 550), EC FP6
project COBOL the Volkswagenstiftung and the Hermann
Lilly Schilling Stiftung.
R EFERENCES
[1] G. di Pellegrino, L. Fadiga, L. Fogassi, V. Gallese, and G. Rizzolatti,
“Understanding motor events: a neurophysiological study.” Exp Brain
Res, vol. 91, pp. 176–180, 1992.
[2] G. Rizzolatti and L. Craighero, “The mirror-neuron system.” Annu Rev
Neurosci, vol. 27, pp. 169–192, 2004.
[3] E. Oztop, M. Kawato, and M. A. Arbib, “Mirror neurons and imitation:
A computationally guided review,” Neural Networks, vol. 19, pp. 254–
271, 2006.
[4] V. Gallese, L. Fadiga, L. Fogassi, and G. Rizzolatti, “Action recognition in the premotor cortex,” Brain, vol. 119, pp. 593–609, 1996.
[5] G. Rizzolatti, V. Fadiga, Luciano Gallese, and L. Fogassi, “Premotor
cortex and the recognition of motor actions,” Cognitive Brain Research, vol. 3, pp. 131–141, 1996.
[6] A. H. Fagg and M. A. Arbib, “Modeling parietal–premotor interactions
in primate control of grasping,” Neural Networks, vol. 11, pp. 1277–
1303, 1998.
[7] E. Oztop and M. A. Arbib, “Schema design and implementation of the
grasp-related mirror neuron system,” Biological Cybernetics, vol. 87,
pp. 116–140, 2002.
[8] M. Haruno, D. M. Wolpert, and M. M. Kawato, “Mosaic model for
sensorimotor learning and control,” Neural Comput., vol. 13, pp. 2201–
2220, 2001.
[9] G. Metta, G. Sandini, L. Natale, L. Craighero, and L. Fadiga, “Understanding mirror neurons: A bio-robotic approach,” Interaction Studies,
vol. 7, pp. 197–232, 2006.
[10] Y. Demiris and G. Simmons, “Perceiving the unusual: temporal
properties of hierarchical motor representations for action perception,”
Neural Netw., vol. 19, no. 3, pp. 272–284, 2006.
[11] W. Erlhagen, A. Mukovskiy, E. Bicho, G. Panin, C. Kiss, A. Knoll,
H. T. van Schie, and H. Bekkering, “Goal-directed imitation for robots:
A bio-inspired approach to action understanding and skill learning.”
Robotics and Autonomous Systems, vol. 54, pp. 353–360, 2006.
[12] T. Poggio and S. Edelman, “A network that learns to recognize 3d
objects,” Nature, vol. 343, pp. 263–266, 1990.
[13] N. Logothetis, J. Pauls, and T. Poggio, “Shape representation in the
inferior temporal cortex of monkeys.” Current Biology, vol. 5, pp.
552–563, 1995.
[14] S. Edelman, Representation and Recognition in Vision. MIT Press,
1999.
[15] M. J. Tarr and H. H. Bülthoff, Eds., Object recognition in man, monkey,
and machine. Cambridge, MA, USA: MIT Press, 1998.
24