Download Exponential Maps for Computer Vision

Exponential Maps for Computer Vision Nick Birnie School of Informatics University of Edinburgh 1 Introduction In computer vision, the exponential map is the natural generalisation of the ordinary exponential function to matrix elements. The technique is based on generating a manifold embedding of the geometric features of the scene on which to estimate trajectories primarily of motion or invariance. An advantage of using the exponential map is the existence of a closed form time-update equation for the state. 2 Definition The most natural definition of the exponential map arises in the study of Differential Geometry as the generalisation of the exponential function. Consider the solution of a linear ordinary differential equation of the form f˙(t) = Lf (t) =⇒ f˙(t) = eLt f (0) in which L is a linear operator. The exponentiation follows from the regular series expansion where f is a scalar valued function. However, for vector valued f , e takes on the characteristics of the exponential map. This concept is readily applicable in Lie Theory - the study of groups that form differentiable manifolds - where it generalises the exponential function to the infinitesimal elements of Lie Groups. This section formalises both definitions. 2.1 Differential Geometric Definition Let M be a differentiable manifold and p a point on M . Let Tp M denote the tangent space to M at p. Then for a vector v ∈ Tp M , there exists a unique geodesic γ 7→ M , such that γ(0) = p and γ 0 (0) = v. The exponential map of p, is then defined as expp (v) = γ(1), i.e. the exponential map is the point reached by the transport of γ. 1 2.2 Lie Theory Definition Let G be a Lie group1 and g be its Lie algebra2 (see [3] for a formal definition of these terms). Several definitions of the exponential map are possible. In computer vision, the most natural is the special case where G is a matrix Lie group. The exponential map is simply defined to coincide with the matrix exponential series expansion. exp(X) = ∞ X Xk k=0 1 1 = I + X + X2 + X3 + · · · k! 2 6 Another definition is possible which is directly equivalent to the differential geometric definition above. Redefining γ as a one parameter subgroup, with elements determined by a vector v ∈ g, directly converts between geodesics and groups. 3 Rigid Body Tracking In 3D computer vision, the problem of tracking an object in video is typically addressed by maintaining a transformation for each of the object’s degrees of freedom. The result is an estimation of the 3D pose with reference to several coordinate frames. Simplification of the problem is possible when rigid body transformations are considered. The reason for this is that such transformations preserve the distance between any two points. This allows tracking an object by a single transformation of its coordinate frame. It is necessary to prevent reflections in the transformation in order to track a solid object through space. A constraint is imposed to exclude those transformations which preserve distance but reflect in a particular plane. Therefore, only transformations which preserve orientation are considered. A more formal definition is possible, where-by the Special Euclidean transformations are required to preserve the norm and cross-product of two vectors. A pair of Cartesian coordinate systems are then required to specify the position of the object relative to the camera. The object coordinate frame is relative to a fixed reference point, known as the world coordinate frame. One may then define a point on the camera, attach an axis, and track the displacement from the world frame. Equivalently, one may also allow the object frame to vary keeping the camera frame fixed. The key idea is maintaining a transformation relating the object and world frames. An illustration of the geometric structure is presented in Figure 1. Relating the object and world frames is a form of restricted affine transformation. Specifically, only composed of two components; a rotation and a translation. Next, derivation of these is handled in turn, with two purposes. (a) Representation in homogeneous coordinates. (b) Representation as a parametric model. 1 Informally, a Lie group consists of infinitesimal elements with the property that it is also a differentiable manifold. 2 Again informally, the elements of g compose the tangent space to G at the identity element. 2 h Figure 1: Transformation, g, of a camera frame, C (x,y,z), relative to the world frame, W (X,Y,Z). (Source [3]) 3.1 Exponential Representation The rotational component can be developed independently of the translational part. A property in common with all rotation matrices is that the transpose is equal to the inverse. T Rwc Rwc =I A family of matrices satisfying this property is known as the orthogonal matrices. They form a group O(3) under the group operation of matrix multiplication. An additional constraint is imposed to limit the group to be only orientation preserving matrices. Specifically, the requirement is that the determinant equals +1. The terminology is similar, denoted SO(3), for the special orthogonal matrices. SO(3) , {R ∈ R3×3 | RRT = I, det(R) = +1} Although the number of parameters is potentially 3 × 3 = 9, the constraint imposed by RRT = I implies that only three of these are free, which equals the dimensionality of the space of rotation matrices. A parametric representation for rotation matrices has therefore been developed. A continuous map R(t) : t 7→ SO(3) is defined representing a rotational trajectory of an object relative to the world frame. The rotational velocity Ṙ(t)RT (t) can then be represented as a skew-symmetric matrix M ∈ R3×3 . A 3-vector ω is defined containing the free parameters of the rotation matrix, and ω̂ = M is added to the notation. The tangent space to SO(3) at the identity element is the space of skew-symmetric matrices, also known as its Lie Algebra. so(3) , {ω̂ ∈ R3×3 | ω ∈ R3 } Now attention shifts to translation matrices in three dimensions, and how these can be viewed in a parametric form. By extracting the difference terms from the fourth column, a parametric model of three parameters is formed. A complete rigid body motion is specified 3 by a translation and a rotation matrix. These can be written together in block from, as follows Rwc Twc g= (1) 0 1 This allows complete representation of a rigid body motion. It follows immediately that the number of parameters is six. Together, the collection of all these matrices is precisely the group of orientation preserving Euclidean transformations, SE(3), for the Special Euclidean group. Rwc Twc 3 SE(3) , g = | Rwc ∈ SO(3), Twc ∈ R 0 1 Based on the homogeneous transformation, encapsulating the complete rigid body motion, it is possible to represent the position of the object at the next time step in a matrix exponential form. The tangent space is given by the following Lie algebra ω̂ v 3 | ω̂ ∈ so(3), v ∈ R se(3) , ξˆ = 0 0 where v is defined as Ṫ (t) − ω̂(t)T (t). The intuitive explanation for this is the effect on the translational velocity by applying a rotation. Such a translation is known as a twist. Together the first row of the matrix ξˆ defines the unique geodesic in the direction of the tangent vector at g(t), as introduced in §2. The tangent vector to the geodesic is given by the following ordinary differential equation. ˆ ġ(t) = ξg(t) The solution of which is given by g(t) = eξ̂t g(0) where eω̂t is the matrix exponential, obtained from the exponential series expansion ξ̂t e = ∞ X ˆ n (ξt) n=1 n! Hence, assuming R(0) = I, the exponential map can now be defined ˆ → exp : se(3) → SE(3); ξt 7 eξ̂t 4 Bibliographic Notes The content of §3 was composed following meticulous study of [3], a sample chapter of which is available from the book’s website. In particular, Figure 1 should be noted to originate 4 from those authors, together with the notational quirks and the derivation of the exponential map. Much more detail on the exponential representation is available in the book. A point of interest is the logarithmic map, which is given by the right inverse of the exponential map. Applications in motion tracking first appear in [1]. The authors represent the kinematic chain of humans as a product of exponential maps and produce a differential model of motion, showing comparable performance to previous methods. Another application is statistical shape estimation, where the computation of geometric invariants from image data is performed. Fletcher et al [2] generalised Principal Components Analysis to operate on the non-Euclidean geometry of Lie groups, and named the method Principal Geodesic Analysis. In contrast to motion tracking, the authors use the medial representations as Lie group elements. An algorithm is then given for computing the basis vectors (which are geodesics on the Lie group). References [1] C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR ’98, pages 8–, Washington, DC, USA, 1998. IEEE Computer Society. [2] P. Thomas Fletcher, Conglin Lu, and Sarang Joshi. Statistics of shape via principal geodesic analysis on lie groups. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 1:95, 2003. [3] Yi Ma, Stefano Soatto, Jana Kosecka, and S. Shankar Sastry. An Invitation to 3-D Vision: From Images to Geometric Models. SpringerVerlag, 2003. 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Exponential Maps for Computer Vision