Download Interpreting Manipulation Actions: a Cognitive Approach

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transformational grammar wikipedia , lookup

Cognitive semantics wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Construction grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
Interpreting Manipulation Actions: a Cognitive Approach
Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos
Computer Vision Lab, University of Maryland, College Park
What are Manipulation Actions?
Core Reasoning Module: a Manipulation Action Context-free Grammar
Cognitive systems that interact with humans must be able to interpret actions.
Here we are concerned with manipulation actions, that is actions performed
by agents (humans or robots) on objects,
which result in some physical change of
the object.
It comes with a set of generative rules and a set of parsing algorithms. The parsing algorithms have two main operations: “construction” (Fig. 2(a)) and “destruction” (Fig. 2(b)).
The Cognitive Approach
This paper [1] describes the architecture of
a cognitive system that interprets human
manipulation actions from perceptual information (image and depth data) and that
includes interacting modules for perception
and reasoning. At the high level, actions are
represented with the Manipulation Action
Grammar, a context-free grammar that organizes actions as a sequence of sub events.
Each sub event is described by the hand,
movements, objects and tools involved, and
the relevant information about these quantities is obtained from perception modules.
These modules track the hands and objects,
and they recognize the hand grasp, objects
and actions using attention, segmentation,
and feature description.
The Manipulation Action
Context-free Grammar
The Manipulation Grammar (Table 1) is
presented to serve as the core reasoning
module for parsing manipulation actions.
AP → A O | A HP
(1)
HP → H AP | HP AP (2)
H →h
A →a
O →o
(3)
The nonterminals H, A, and O represent
the hand, the action and the object ( the
tools and objects under manipulation), respectively, and the terminals, h, a and o are
the observations. AP is the Action Phrase
and HP is the Hand Phrase.
Cognitive MACFG Parsing
Algorithms
Visual Modules
Attention with Torque Operator [2]:
Insights behind the MACFG
The design is motivated by following observations: First, the main and only driving
force in manipulation actions are the hands.
Thus, a specialized nonterminal symbol H
is used for their representation. Second, an
Action (A) can be applied to an Object (O)
directly or to a Hand Phrase (HP ), which
in turn contains an Object (O). This is
encoded in Rule (1), which builds up an
Action Phrase (AP ). Finally, an Action
Phrase (AP ) can be combined either with
the Hand (H), or a Hand Phrase. This is
encoded in rule (2), which recursively builds
up the Hand Phrase.
(a)
(b)
(c)
Figure 4: (a) Torque for images, (b) a sample input frame, and
(c) Torque operator response.
Hand Tracking, Grasp Type Classification
and Trajectory based Action Recognition:
(a)
(b)
(c)
(d)
Figure 5: (a) One example of fully articulated hand model tracking, (b) a 3-D illustration of the tracked model. and (c-d) examples of grasp type recognition for both hands.
Object Monitoring and Recognition [3]:
An Example
References
[1] Y. Yang, et al. Advances in Cognitive Systems, 2013.
[2] M Nishigaki, et al. CVPR, 2012.
(a)
[3] Y. Yang, et al. CVPR, 2013.
(b)
Figure 2: The (a) construction and (b) destruction operations.
Acknowledgements
Fine dashed lines are newly added connections, crosses node
Figure 1: Overview of the manipulation action understanding
system, including feedback loops within and between some of
the modules. The feedback is denoted by the dotted arrow.
deletion, and fine dotted lines are connections to be deleted.
Figure 3: The framework for the MACFG and its associated
parsing algorithms. In this figure a typical manipulation action
For further information, please contact
example, “Cut an eggplant”, is observed and the system builds
[email protected]
a sequence of six trees to represent this action.