* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Interpreting Manipulation Actions: a Cognitive Approach
Survey
Document related concepts
Transcript
Interpreting Manipulation Actions: a Cognitive Approach Yezhou Yang, Cornelia Fermüller, Yiannis Aloimonos Computer Vision Lab, University of Maryland, College Park What are Manipulation Actions? Core Reasoning Module: a Manipulation Action Context-free Grammar Cognitive systems that interact with humans must be able to interpret actions. Here we are concerned with manipulation actions, that is actions performed by agents (humans or robots) on objects, which result in some physical change of the object. It comes with a set of generative rules and a set of parsing algorithms. The parsing algorithms have two main operations: “construction” (Fig. 2(a)) and “destruction” (Fig. 2(b)). The Cognitive Approach This paper [1] describes the architecture of a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and that includes interacting modules for perception and reasoning. At the high level, actions are represented with the Manipulation Action Grammar, a context-free grammar that organizes actions as a sequence of sub events. Each sub event is described by the hand, movements, objects and tools involved, and the relevant information about these quantities is obtained from perception modules. These modules track the hands and objects, and they recognize the hand grasp, objects and actions using attention, segmentation, and feature description. The Manipulation Action Context-free Grammar The Manipulation Grammar (Table 1) is presented to serve as the core reasoning module for parsing manipulation actions. AP → A O | A HP (1) HP → H AP | HP AP (2) H →h A →a O →o (3) The nonterminals H, A, and O represent the hand, the action and the object ( the tools and objects under manipulation), respectively, and the terminals, h, a and o are the observations. AP is the Action Phrase and HP is the Hand Phrase. Cognitive MACFG Parsing Algorithms Visual Modules Attention with Torque Operator [2]: Insights behind the MACFG The design is motivated by following observations: First, the main and only driving force in manipulation actions are the hands. Thus, a specialized nonterminal symbol H is used for their representation. Second, an Action (A) can be applied to an Object (O) directly or to a Hand Phrase (HP ), which in turn contains an Object (O). This is encoded in Rule (1), which builds up an Action Phrase (AP ). Finally, an Action Phrase (AP ) can be combined either with the Hand (H), or a Hand Phrase. This is encoded in rule (2), which recursively builds up the Hand Phrase. (a) (b) (c) Figure 4: (a) Torque for images, (b) a sample input frame, and (c) Torque operator response. Hand Tracking, Grasp Type Classification and Trajectory based Action Recognition: (a) (b) (c) (d) Figure 5: (a) One example of fully articulated hand model tracking, (b) a 3-D illustration of the tracked model. and (c-d) examples of grasp type recognition for both hands. Object Monitoring and Recognition [3]: An Example References [1] Y. Yang, et al. Advances in Cognitive Systems, 2013. [2] M Nishigaki, et al. CVPR, 2012. (a) [3] Y. Yang, et al. CVPR, 2013. (b) Figure 2: The (a) construction and (b) destruction operations. Acknowledgements Fine dashed lines are newly added connections, crosses node Figure 1: Overview of the manipulation action understanding system, including feedback loops within and between some of the modules. The feedback is denoted by the dotted arrow. deletion, and fine dotted lines are connections to be deleted. Figure 3: The framework for the MACFG and its associated parsing algorithms. In this figure a typical manipulation action For further information, please contact example, “Cut an eggplant”, is observed and the system builds [email protected] a sequence of six trees to represent this action.