* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Survey
Document related concepts
Convolutional neural network wikipedia , lookup
Stereopsis recovery wikipedia , lookup
Premovement neuronal activity wikipedia , lookup
Biological neuron model wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Synaptic gating wikipedia , lookup
Dual consciousness wikipedia , lookup
Pattern recognition wikipedia , lookup
Computer vision wikipedia , lookup
Channelrhodopsin wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Mirror neuron wikipedia , lookup
Transcript
Michael Arbib with Erhan Oztop (Modeling) Giacomo Rizzolatti (Neurophysiology) The Mirror Neuron System for Grasping Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 1 Part 1 Classic Concepts of Grasp Control Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 2 Preshaping While Reaching to Grasp Jeannerod & Biguer 1979 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 3 A Classic Coordinated Control Program for Reaching and Grasping recognition criteria activation of visual search visual input visual input Visual Location target location Size Recognition size activation of reaching visual input Orientation Recognition orientation visual and kinesthetic input Fast Phase Movement Hand Preshape Slow Phase Movement Hand Reaching Perceptual schemas visual , kinesthetic, and tactile input Hand Rotation Motor schemas Actual Grasp Grasping (Arbib 1981) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 4 Opposition Spaces and Virtual Fingers The goal of a successful preshape, reach and grasp is to match the opposition axis defined by the virtual fingers of the hand with the opposition axis defined by an affordance of the object (Iberall and Arbib 1990) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 5 Hand State Our current representation of hand state defines a 7-dimensional trajectory F(t) with the following components F(t) = (d(t), v(t), a(t), o1(t), o2(t), o3(t), o4(t)): d(t): distance to target at time t v(t): tangential velocity of the wrist a(t): Aperture of the virtual fingers involved in grasping at time t o1(t): Angle between the object axis and the (index finger tip – thumb tip) vector [relevant for pad and palm oppositions] o2(t): Angle between the object axis and the (index finger knuckle – thumb tip) vector [relevant for side oppositions] o3(t), o4(t): The two angles defining how close the thumb is to the hand as measured relative to the side of the hand and to the inner surface of the palm. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 6 Part 2 Basic Neural Mechanisms of Grasp Control Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 7 Visual Control of Grasping in Macaque Monkey A key theme of visuomotor coordination: parietal affordances (AIP) drive frontal motor schemas (F5) F5 - grasp commands in premotor cortex Giacomo Rizzolatti AIP - grasp affordances in parietal cortex Hideo Sakata Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 8 Grasp Specificity in an F5 Neuron Precision pinch (top) Power grasp (bottom) (Data from Rizzolatti et al.) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 9 A Broad Perspective on F5-AIP Interactions tell PM the possible categories affordances for PM precise motor coordinates for motor cortex “here’s my choice” PM (Premotor Cortex) codes action fairly abstractly PP (Posterior Parietal) overall action the details action parameters Motor Cortex Cerebellum Tuning and coordinating MPGs and MPGs Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 10 FARS (Fagg-Arbib-Rizzolatti-Sakata) Model Overview AIP extracts affordances features of the object relevant to physical interaction with it. Prefrontal cortex provides “context” so F5 may select an appropriate affordance AIP AIP Dorsal Stream: dorsal/ventral Affordances streams Ways to grab this “thing” Constraints TTask ask Constraints (F 6) (F6) Working Working M Memory emory (46) (46?) Instruction Stimuli Instruction Stimuli (F2) (F2) PFC F5 F5 “It’s a mug” Ventral Stream: Recognition IT Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 11 Part 3 Neural Mechanisms of Grasp Control that Support a “Mirror System” Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 12 Mirror Neurons Rizzolatti, Fadiga, Gallese, and Fogassi, 1995: Premotor cortex and the recognition of motor actions Mirror neurons form the subset of grasp-related premotor neurons of F5 which discharge when the monkey observes meaningful hand movements made by the experimenter or another monkey. F5 is endowed with an observation/execution matching system [The non-mirror grasp neurons of F5 are called F5 canonical neurons.] Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 13 What is the mirror system (for grasping) for? Mirror neurons: The cells that selectively discharge when the monkey executes particular actions as well as when the monkey observes an other individual executing the same action. Mirror neuron system (MNS): The mirror neurons and the brain regions involved in eliciting mirror behavior. Interpretations: • Action recognition • Understanding (assigning meaning to other’s actions) • Associative memory for actions Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 14 Computing the Mirror System Response The FARS Model: Recognize object affordances and determine appropriate grasp. The Mirror Neuron System (MNS) Model: We must add recognition of trajectory and hand preshape to recognition of object affordances and ensure that all three are congruent. There are parietal systems other than AIP adapted to this task. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 15 Further Brain Regions Involved cIPS: cIPS cIPS Detection of biologically meaningful stimuli (e.g.hand actions) Motion related activity (MT/MST part) STS: Superior Temporal Sulcus 7b (PF): Rostral part of the posterior parietal lobule caudal intraparietal sulcus Axis orientation and surface orientation Spatial coding 7a (PG): for objects, caudal part of analysis of the posterior motion during parietal lobule interaction of objects and self-motion Mainly somatosensory Mirror-like responses Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 16 cIPS cell response Surface orientation selectivity of a cIPS cell cIPS cIPS Sakata et al. 1997 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 17 Key Criteria for Mirror Neuron Activation When Observing a Grasp a) Does the preshape of the hand correspond to the grasp encoded by the mirror neuron? b) Does this preshape match an affordance of the target object? c) Do samples of the hand state indicate a trajectory that will bring the hand to grasp the object? Modeling Challenges: i) To have mirror neurons self-organize to learn to recognize grasps in the monkey’s motor repertoire ii) To learn to activate mirror neurons from smaller and smaller samples of a trajectory. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 18 Hypothesis on Mirror Neuron Development The development of the (grasp) mirror neuron system in a healthy infant is driven by the visual stimuli generated by the actions (grasps) performed by the infant himself. The infant (with maturation of visual acuity) gains the ability to map other individual’s actions into his internal motor representation. [In the MNS model, the hand state provides the key representation for this transfer.] Then the infant acquires the ability to create (internal) representations for novel actions observed. Parallel to these achievements, the infant develops an action prediction capability (the recognition of an action given the prefix of the action and the target object) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 19 The Mirror Neuron System (MNS) Model Object features Visual Cortex cIPS 7b Ob ject a ffor da nc e e xtr a ction H an d m otion d ete ction S TS AIP Ob ject af ford a nce -h an d state a ssociation H an d sha p e r ecog nition F5cano nical M otor p rog ra m (Gr a sp ) Integrate tem po ral ass ociation Act ion M irro r Feed back r ecog nition H an d -Obje ct sp atia l re lation a na lysis 7a (M irr or N eu r ons) F5mirror M otor p rog ra m (Re a ch) M otor e xe cu tion M1 F4 Ob ject loca tion MIP/LIP/VIP Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 20 Part 4 Implementing the Basic Schemas of the Mirror Neuron System (MNS) Model using Artificial Neural Networks (The Work of Erhan Oztop) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 21 Curve recognition The general problem: associate N-dimensional space curves with object affordances A special case: The recognition of two (or three) dimensional trajectories in physical space Simplest solution: Map temporal information into spatial domain. Then apply known pattern recognition techniques. Problem with simplest solution: The speed of the moving point can be a problem! The spatial representation may change drastically with the speed Scaling can overcome the problem. However the scaling must be such that it preserves the generalization ability of the pattern recognition engine. Solution: Fit a cubic spline to the sampled values. Then normalize and resample from the spline curve. Result:Very good generalization. Better performance than using the Fourier coefficients to recognize curves. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 22 Curve recognition Curve recognition system demonstrated for hand drawn numeral recognition (successful recognition examples for 2, 8 and 3). Spatial resolution: 30 Network input size: 30 Hidden layer size: 15 Output size: 5 Training : Back-propagation with momentum.and adaptive learning rate Sampled points Point used for spline interpolation Fitted spline Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 23 STS hand shape recognition Color Coded Hand Feature Extraction Step 1 of hand shape recognition: system processes the color-coded hand image and generates a set of features to be used by the second step Model Matching Step 2: The feature vector generated by the first step is used to fit a 3D-kinematics model of the hand by the model matching module. The resulting hand configuration is sent to the classification module. Precision grasp Hand Configuration Classification Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 24 STS hand shape recognition 1: Color Segmentation and Feature Extraction Preprocessing Color Expert (Network weights) Training phase: A color expert is generated by training a feed-forward network to approximate human perception of color. Features NN augmented segmentation system Actual processing: The hand image is fed to the augmented segmentation system. The color decision during segmentation is done by consulting color expert. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 25 STS hand shape recognition2: 3D Hand Model Matching Feature Vector Error minimization Result of feature extraction A realistic drawing of hand bones. The hand is modelled with 14 degrees of freedom as illustrated. Grasp Type Classification The model matching algorithm minimizes the error between the extracted features and the model hand. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 26 Virtual Hand/Arm and Reach/Grasp Simulator A precision pinch A power grasp and a side grasp Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 27 Power grasp time series data +: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1-axisdisp2; : speed; : distance. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 28 Core Mirror Circuit Object affordance Association (7b) Neurons Mirror Neurons (F5mirror) Mirror Neuron Output Hand state Motor Program (F5 canonical) Object Affordances Object affordance hand state association Integrate temporal association Mirror Feedback Motor program F5canonical Hand shape recognition & Hand motion detection Mirror Feedback Hand-Object spatial relation analysis Action recognition (Mirror Neurons) Motor program Motor execution F5mirror Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 29 Connectivity pattern Object affordance (AIP) STS F5mirror 7b Motor Program (F5canonical) Mirror Feedback 7a Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 30 A single grasp trajectory viewed from three different angles The wrist trajectory during the grasp is shown by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals. How the network classifies the action as a power grasp. Empty squares: power grasp output; filled squares: precision grasp; crosses: side grasp output Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 31 Power and precision grasp resolution (a) (b) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 32 Part 5 Quo Vadis? Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 33 Future Directions 1) Technology: Increasing the robustness and learning rates of the schemas using improved learning algorithms for artificial neural nets. 2) Neuroscience: Implementing the schemas with biologically plausible neural nets to model neurophysiological data. 3) Learning to recognize variations on, and assemblages of, familiar actions. 4) Imitation 5) Language! Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 34 Michael Arbib: CS564 - Brain Theory and Artificial Intelligence University of Southern California, Fall 2001 Lecture 10. The Mirror Neuron System Model (MNS) 1 Reading Assignment: Schema Design and Implementation of the Grasp-Related Mirror Neuron System Erhan Oztop and Michael A. Arbib Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 35 Visual Control of Grasping in Macaque Monkey A key theme of visuomotor coordination: parietal affordances (AIP) drive frontal motor schemas (F5) F5 - grasp commands in premotor cortex Giacomo Rizzolatti AIP - grasp affordances in parietal cortex Hideo Sakata Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 36 Mirror Neurons Rizzolatti, Fadiga, Gallese, and Fogassi, 1995: Premotor cortex and the recognition of motor actions Mirror neurons form the subset of grasp-related premotor neurons of F5 which discharge when the monkey observes meaningful hand movements made by the experimenter or another monkey. F5 is endowed with an observation/execution matching system Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 37 F5 Motor Neurons F5 Motor Neurons include all F5 neurons whose firing is related to motor activity. We focus on grasp-related behavior. Other F5 motor neurons are related to oro-facial movements. F5 Mirror Neurons form the subset of grasp-related F5 motor neurons of F5 which discharge when the monkey observes meaningful hand movements. F5 Canonical Neurons form the subset of grasp-related F5 motor neurons of F5 which fire when the monkey sees an object with Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 38 What is the mirror system (for grasping) for? Mirror neurons: The cells that selectively discharge when the monkey executes particular actions as well as when the monkey observes an other individual executing the same action. Mirror neuron system (MNS): The mirror neurons and the brain regions involved in eliciting mirror behavior. Interpretations: • Action recognition • Understanding (assigning meaning to other’s actions) • Associative memory for actions Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 39 Computing the Mirror System Response The FARS Model: Recognize object affordances and determine appropriate grasp. The Mirror Neuron System (MNS) Model: We must add recognition of trajectory and hand preshape to recognition of object affordances and ensure that all three are congruent. There are parietal systems other than AIP adapted to this task. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 40 Further Brain Regions Involved cIPS: cIPS cIPS caudal intraparietal sulcus Axis and surface orientation Detection of biologically meaningful stimuli (e.g.hand actions) Motion related activity (MT/MST part) STS: Superior Temporal Sulcus 7b (PF): Rostral part of the posterior parietal lobule Spatial coding 7a (PG): for objects, caudal part of analysis of the posterior motion during parietal lobule interaction of objects and self-motion Mainly somatosensory Mirror-like responses Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 41 cIPS cell response Surface orientation selectivity of a cIPS cell cIPS cIPS Sakata et al. 1997 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 42 Key Criteria for Mirror Neuron Activation When Observing a Grasp a) Does the preshape of the hand correspond to the grasp encoded by the mirror neuron? b) Does this preshape match an affordance of the target object? c) Do samples of the hand state indicate a trajectory that will bring the hand to grasp the object? Modeling Challenges: i) To have mirror neurons self-organize to learn to recognize grasps in the monkey’s motor repertoire ii) To learn to activate mirror neurons from smaller and smaller samples of a trajectory. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 43 Initial Hypothesis on Mirror Neuron Development The development of the (grasp) mirror neuron system in a healthy infant is driven by the visual stimuli generated by the actions (grasps) performed by the infant himself. The infant (with maturation of visual acuity) gains the ability to map other individual’s actions into his internal motor representation. [In the MNS model, the hand state provides the key representation for this transfer.] Then the infant acquires the ability to create (internal) representations for novel actions observed. Parallel to these achievements, the infant develops an action prediction capability (the recognition of an action given the prefix of the action and the target object) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 44 The Mirror Neuron System (MNS) Model Object features Visual Cortex cIPS 7b Ob ject a ffor da nc e e xtr a ction H an d m otion d ete ction S TS AIP Ob ject af ford a nce -h an d state a ssociation H an d sha p e r ecog nition F5cano nical M otor p rog ra m (Gr a sp ) Integrate tem po ral ass ociation Act ion M irro r Feed back r ecog nition H an d -Obje ct sp atia l re lation a na lysis 7a (M irr or N eu r ons) F5mirror M otor p rog ra m (Re a ch) M otor e xe cu tion M1 F4 Ob ject loca tion MIP/LIP/VIP Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 45 Implementing the Basic Schemas of the Mirror Neuron System (MNS) Model using Artificial Neural Networks (Work of Erhan Oztop) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 46 Opposition Spaces and Virtual Fingers The goal of a successful preshape, reach and grasp is to match the opposition axis defined by the virtual fingers of the hand with the opposition axis defined by an affordance of the object (Iberall and Arbib 1990) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 47 Hand State Our current representation of hand state defines a 7-dimensional trajectory F(t) with the following components F(t) = (d(t), v(t), a(t), o1(t), o2(t), o3(t), o4(t)): d(t): distance to target at time t v(t): tangential velocity of the wrist a(t): Aperture of the virtual fingers involved in grasping at time t o1(t): Angle between the object axis and the (index finger tip – thumb tip) vector [relevant for pad and palm oppositions] o2(t): Angle between the object axis and the (index finger knuckle – thumb tip) vector [relevant for side oppositions] o3(t), o4(t): The two angles defining how close the thumb is to the hand as measured relative to the side of the hand and to the inner surface of the palm. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 48 Curve recognition The general problem: associate N-dimensional space curves with object affordances A special case: The recognition of two (or three) dimensional trajectories in physical space Simplest solution: Map temporal information into spatial domain. Then apply known pattern recognition techniques. Problem with simplest solution: The speed of the moving point can be a problem! The spatial representation may change drastically with the speed Scaling can overcome the problem. However the scaling must be such that it preserves the generalization ability of the pattern recognition engine. Solution: Fit a cubic spline to the sampled values. Then normalize and resample from the spline curve. Result:Very good generalization. Better performance than using the Fourier coefficients to recognize curves. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 49 Curve recognition Curve recognition system demonstrated for hand drawn numeral recognition (successful recognition examples for 2, 8 and 3). Spatial resolution: 30 Network input size: 30 Hidden layer size: 15 Output size: 5 Training : Back-propagation with momentum.and adaptive learning rate Sampled points Point used for spline interpolation Fitted spline Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 50 STS hand shape recognition Color Coded Hand Feature Extraction Step 1 of hand shape recognition: system processes the color-coded hand image and generates a set of features to be used by the second step Model Matching Step 2: The feature vector generated by the first step is used to fit a 3D-kinematics model of the hand by the model matching module. The resulting hand configuration is sent to the classification module. Precision grasp Hand Configuration Classification Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 51 STS hand shape recognition 1: Color Segmentation and Feature Extraction Preprocessing Color Expert (Network weights) Training phase: A color expert is generated by training a feed-forward network to approximate human perception of color. Features NN augmented segmentation system Actual processing: The hand image is fed to the augmented segmentation system. The color decision during segmentation is done by consulting color expert. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 52 STS hand shape recognition2: 3D Hand Model Matching Feature Vector Error minimization Result of feature extraction A realistic drawing of hand bones. The hand is modelled with 14 degrees of freedom as illustrated. Grasp Type Classification The model matching algorithm minimizes the error between the extracted features and the model hand. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 53 Virtual Hand/Arm and Reach/Grasp Simulator A precision pinch A power grasp and a side grasp Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 54 Power grasp time series data +: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1-axisdisp2; : speed; : distance. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 55 Core Mirror Circuit Object affordance Association (7b) Neurons Mirror Neurons (F5mirror) Mirror Neuron Output Hand state Motor Program (F5 canonical) Object Affordances Object affordance hand state association Integrate temporal association Mirror Feedback Motor program F5canonical Hand shape recognition & Hand motion detection Mirror Feedback Hand-Object spatial relation analysis Action recognition (Mirror Neurons) Motor program Motor execution F5mirror Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 56 Connectivity pattern Object affordance (AIP) STS F5mirror 7b Motor Program (F5canonical) Mirror Feedback 7a Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 57 A single grasp trajectory viewed from three different angles The wrist trajectory during the grasp is shown by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals. How the network classifies the action as a power grasp. Empty squares: power grasp output; filled squares: precision grasp; crosses: side grasp output Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 58 Power and precision grasp resolution (a) Note that the modeling yields novel predictions for time course of activity across a population of mirror neurons. (b) Precision Pinch Mirror Neuron Power Grasp Mirror Neuron Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 59 Research Plan Development of the Mirror System Development of Grasp Specificity in F5 Motor and Canonical Neurons Visual Feedback for Grasping: A Possible Precursor of the Mirror Property Recognition of Novel and Compound Actions and their Context The Pliers Experiment: Extending the Visual Vocabulary Recognition of Compounds of Known Movements From Action Recognition to Understanding: Context and Expectation Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 60 Michael Arbib: CS564 - Brain Theory and Artificial Intelligence University of Southern California, Fall 2001 Lecture 23: MNS Model 2 Michael Arbib and Erhan Oztop The Mirror Neuron System for Grasping: Visual Processing for the MNS model The Virtual Arm The Core Mirror Neuron Circuit Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 61 The Mirror Neuron System (MNS) Model Object features Visual Cortex cIPS 7b Ob ject a ffor da nc e e xtr a ction H an d m otion d ete ction S TS AIP Ob ject af ford a nce -h an d state a ssociation H an d sha p e r ecog nition F5cano nical M otor p rog ra m (Gr a sp ) Integrate tem po ral ass ociation Act ion M irro r Feed back r ecog nition H an d -Obje ct sp atia l re lation a na lysis 7a (M irr or N eu r ons) F5mirror M otor p rog ra m (Re a ch) M otor e xe cu tion M1 F4 Ob ject loca tion MIP/LIP/VIP Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 62 Visual Processing for the MNS model How much we should attempt to solve ? Even though computers are getting more powerful every day the vision problem in its general form is an unsolved problem in engineering. There exists gesture recognition systems for human-computer interaction and sign language interpretation (Holden) Our vision system must at least we need 63 to Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language Simplifying the problem We attempt to recognize the Hand and its Configuration by simplifying the problem by using color markers on the articulation points of the hand. If we can extract the marker positions reliably then we can try to extract some of the features that make up the hand state by trying to estimate the 3D pose of the hand from 2D pose. Thus we have 2 steps: 64 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language Reminder: Hand State components For most components we need to know (3D) configuration of the hand. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 65 A simplified video input • The Vision task is simplified using colored tapes on the joints and articulation points • The First step of hand configuration analysis is to locate the color patches unambiguously (not easy!). Use color segmentation. But we have to compensate for lighting, reflection, shading and wrinkling problems: Robust color detection Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 66 Robust Detection of the Colors – RGB space A color image in a computer is composed of a matrix of pixels triplets (Red,Green,Blue) that define the color of the pixel. We want to label a given pixel color as belonging to one of the color patches we used to mark the hand, or as not belonging to any class. A straightforward way to detect whether a given target color (R’,G’,B’) matches the pixel color (R,G,B) is to look at the squared distance ((R-R’)^2+(G-G’)^2+(B-B’)^2) with a threshold to do the classification.. This does not work well, because the shading and different lighting conditions effect R,G,B values a lot and a our simple nearest neighbor method fails. For example an orange patch under shadow is very close to red in RGB space. Solution: There are better color spaces for color classification which are more robust to shading and lighting effects. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 67 Robust Detection of the Colors - HSVspace HSV stands for (Hue, Saturation, Value) and the Value component carries the information we want. The HSV color model is more suitable for classifying colors in terms of their perceived color. Thus in labeling the pixels we can simply compare the pixels Value component to representative Values (template) for each marker and assign the the pixel to the marker with the closest value unless the difference is not over a threshold in which case we label it is non-marker color. But we can do better: Train a neural network that can do the labeling for us Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 68 Robust Detection of Colors – the Color Expert Create a training set using a test image by manually picking colors from the image and specifying their labels. Create a NN – in our case a one hidden layer feed-forward network - that will accept the R,G,B values as input and put out the marker label, or 0 for a non-marker color. Make sure that the network is not too “powerful” so that it does not memorize the training set (as distinct from generalization) Train it then Use it: When given a pixel to classify, apply the RGB values of the pixel to the trained network and use the output as the marker that the pixel belongs to. One then needs a segmentation system to aggregate the pixels into a patch with a single color label. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 69 STS hand shape recognition 1: Color Segmentation and Feature Extraction Preprocessing Color Expert (Network weights) Training phase: A color expert is generated by training a feed-forward network to approximate human perception of color. Features NN augmented segmentation system Actual processing: The hand image is fed to an augmented segmentation system. The color decision during segmentation is done by the consulting color expert. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 70 Hand Configuration Estimation • Given a color-coded hand image, the first step of the hand configuration extraction is to find the position of the center of color markers. • Then the marker center positions are converted into a feature vector with a corresponding confidence vector. To convert the marker center coordinates into a feature vector simply the wrist position is subtracted from all the centers found and are placed into the feature vector (the relative x,y coordinates for each marker) • The second step of the hand configuration extraction is to create a pose of a 3D hand model such that the features of the given hand image and the 3D hand model is as close as possible. The contribution of each component to the distance is weighted by its confidence vector. Color Coded Hand Feature Extraction • This is an optimization problem which can be solved using gradient descent or hill climbing in weight space (simulated gradient descent) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 71 STS hand shape recognition Color Coded Hand Feature Extraction Step 1 of hand shape recognition: system processes the color-coded hand image and generates a set of features to be used by the second step Model Matching Step 2: The feature vector generated by the first step is used to fit a 3D-kinematics model of the hand by the model matching module. The resulting hand configuration is sent to the classification module. Precision grasp Hand Configuration Classification Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 72 STS hand shape recognition2: 3D Hand Model Matching Feature Vector Error minimization Result of feature extraction A realistic drawing of hand bones. The hand is modelled with 14 degrees of freedom as illustrated. Grasp Type Classification The model matching algorithm minimizes the error between the extracted features and the model hand. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 73 Virtual Hand/Arm and Reach/Grasp Simulator A precision pinch A power grasp and a side grasp Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 74 Virtual Arm A Kinematics model of an arm and hand. 19 DOF freedom: Shoulder(3), Elbow(1), Wrist(3), Fingers(4*2), Thumb (3) Implementation Requirements Rendering: Given the 3D positions of links’ start and end points, generate a 2D representation of the arm/hand (easy) Forward Kinematics: Given the 19 angles of the joints compute the position of each link (easy) Inverse Kinematics: Given a desired position in space for a particular link what are the joint angles to theofdesired position (semi-hard) Arbib andachieve Itti: CS 664 (University Southern California, Spring 2002) Integrating Vision, Action and Language 75 A 2D, 3DOF arm example c C b B a A P(x,y) Forward kinematics: given joint angles A,B,C compute the end effector position P: X = a*cos(A) + b*cos(B) + c*cos(C) Y = a*sin(A) + b*sin(B) + c*sin(C) Radius=c Inverse kinematics: given joint P(x,y) position P there are infinitely many joint angle triplets to achieve P!! b b b Radius=a Radius of the circles are a and c and the segments connecting the circles are all equal length of b Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 76 A Simple Inverse Kinematics Solution Let constrain ourselves to the arm. The forward kinematics of the arm can be represented as a vector function that maps joint angles of the arm to the wrist position. (x,y,z)=F(s1,s2,s3,e) , where s1,s2,s3 are the shoulder angles and e is the elbow angle. We can formulate the inverse kinematics problem as an optimization problem: Given the desired P’=(x’,y’,z’) to be achieved we can introduce the error function J=|| (P’-F(s1,s2,s3,e)) || Then we can compute the gradient with respect to s1,s2,s3,e and follow the minus gradient to reach the Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 77 Other Issues Other Inverse Kinematics Solution Methods? Inverse kinematics for multiple targets… Precision Grasp Planning Determination of finger contact points etc. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 78 Power grasp time series data +: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1-axisdisp2; : speed; : distance. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 79 Curve recognition The general problem: associate N-dimensional space curves with object affordances A special case: The recognition of two (or three) dimensional trajectories in physical space Simplest solution: Map temporal information into spatial domain. Then apply known pattern recognition techniques. Problem with simplest solution: The speed of the moving point can be a problem! The spatial representation may change drastically with the speed Scaling can overcome the problem. However the scaling must be such that it preserves the generalization ability of the pattern recognition engine. Solution: Fit a cubic spline to the sampled values. Then normalize and resample from the spline curve. Result:Very good generalization. Better performance than using the Fourier coefficients to recognize curves. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 80 Curve recognition Curve recognition system demonstrated for hand drawn numeral recognition (successful recognition examples for 2, 8 and 3). Spatial resolution: 30 Network input size: 30 Hidden layer size: 15 Output size: 5 Training : Back-propagation with momentum.and adaptive learning rate Sampled points Point used for spline interpolation Fitted spline Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 81 Core Mirror Circuit Object affordance Association (7b) Neurons Mirror Neurons (F5mirror) Mirror Neuron Output Hand state Motor Program (F5 canonical) Object Affordances Object affordance hand state association Integrate temporal association Mirror Feedback Motor program F5canonical Hand shape recognition & Hand motion detection Mirror Feedback Hand-Object spatial relation analysis Action recognition (Mirror Neurons) Motor program Motor execution F5mirror Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 82 Connectivity pattern Object affordance (AIP) STS F5mirror 7b Motor Program (F5canonical) Mirror Feedback 7a Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 83 A single grasp trajectory viewed from three different angles The wrist trajectory during the grasp is shown by square traces, with the distance between any two consecutive trace marks traveled in equal time intervals. How the network classifies the action as a power grasp. Empty squares: power grasp output; filled squares: precision grasp; crosses: side grasp output Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 84 Power and precision grasp resolution (a) (b) Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 85 Future Directions 1) Technology: Increasing the robustness and learning rates of the schemas using improved learning algorithms for artificial neural nets. 2) Neuroscience: Implementing the schemas with biologically plausible neural nets to model neurophysiological data. 3) Learning to recognize variations on, and assemblages of, familiar actions. 4) Imitation 5) Language! Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 86 Related Research Issues: Grasping and the Mirror System in Monkey Modeling of monkey brain mechanisms for visually guided behavior mirror neurons vocalization and communication multi-modal integration compound behaviors and social interactions this will build, for example on our earlier modeling of interactions of pre-SMA and basal ganglia, and of the role of dopamine in planning based on behavior, neuroanatomy and neurophysiology Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 87 Some Specific Subgoals for Mirror System Modeling Development of the Mirror System Development of Grasp Specificity in F5 Motor and Canonical Neurons Visual Feedback for Grasping: A Possible Precursor of the Mirror Property Recognition of Novel and Compound Actions and their Context The Pliers Experiment: Extending the Visual Vocabulary Recognition of Compounds of Known Movements From Action Recognition to Understanding: Context and Expectation Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 88 Visual Feedback for Grasping: A Possible Precursor of the Mirror Property Hypothesis: the F5 mirror neurons develop by selecting, via re-afferent connections, patterns of visual input describing those relations of hand shapes and motions to objects effective in visual guidance of a successful grasp. The validation here is computational: if the hypothesis is correct, we will be able to show that such a hand control system indeed exhibits most of the properties needed for a mirror system for grasping. For a reaching task, the simplest visual feedback is some form of signal of the distance between object and hand. This may suffice for grabbing bananas, but for peeling a banana, feedback on the shape of the hand relative to the banana, as well as force feedback become crucial. We predict that the parameters needed for such visual feedback for grasp will look very much like those we specified explicitly for our MNS1 hand state. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 89 Related Research Issues 2 a simple imitation system for grasping [Shared with common ancestor of human and chimpanzee] a complex imitation system for grasping Research: Comparative modeling of primate brain mechanisms based on data from primate behavior, neuroanatomy and neurophysiology and human brain imaging: extending the monkey model to chimp and human comparative/evolutionary model of different types of imitation Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 90 Learning in the Mirror System For both oro-facial and grasp mirror neurons we may have a limited "hard-wired" repertoire that can then be built on through learning: 1) Developing a set of basic grasps that are effective; 2) Learning to associate view of one's hand with grasp and object; 3) Matching this to views of others grasping; 4) Learning new grasps by imitations of others. We do not know if the necessary learning is in F5 or elsewhere. Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 91 Beyond Simple Mirroring Tuning mirror neurons in terms of self-movement: Possession of movement precedes its perception but Imitation: Perception of a novel movement precedes ability to perform (some approximation to) that movement. Eventually, we perceive many things we cannot do: Variation = known y + y Assemblage A key question for our empirical study of imitation: How can we objectively compare "imitation capability" across species? Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 92