Download Exponential Maps for Computer Vision

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Four-vector wikipedia, lookup

Symmetry in quantum mechanics wikipedia, lookup

Bra–ket notation wikipedia, lookup

Kinematics wikipedia, lookup

Rigid body dynamics wikipedia, lookup

Velocity-addition formula wikipedia, lookup

Inertial frame of reference wikipedia, lookup

Derivations of the Lorentz transformations wikipedia, lookup

Frame of reference wikipedia, lookup

Spinor wikipedia, lookup

Oscillator representation wikipedia, lookup

Dynamical system wikipedia, lookup

Lorentz transformation wikipedia, lookup

Matrix mechanics wikipedia, lookup

Representation theory of the Lorentz group wikipedia, lookup

Exponential Maps for Computer Vision
Nick Birnie
School of Informatics
University of Edinburgh
In computer vision, the exponential map is the natural generalisation of the ordinary exponential function to matrix elements. The technique is based on generating a manifold
embedding of the geometric features of the scene on which to estimate trajectories primarily
of motion or invariance. An advantage of using the exponential map is the existence of a
closed form time-update equation for the state.
The most natural definition of the exponential map arises in the study of Differential Geometry as the generalisation of the exponential function. Consider the solution of a linear
ordinary differential equation of the form
f˙(t) = Lf (t) =⇒ f˙(t) = eLt f (0)
in which L is a linear operator. The exponentiation follows from the regular series expansion
where f is a scalar valued function. However, for vector valued f , e takes on the characteristics of the exponential map.
This concept is readily applicable in Lie Theory - the study of groups that form differentiable manifolds - where it generalises the exponential function to the infinitesimal elements
of Lie Groups. This section formalises both definitions.
Differential Geometric Definition
Let M be a differentiable manifold and p a point on M . Let Tp M denote the tangent space
to M at p. Then for a vector v ∈ Tp M , there exists a unique geodesic γ 7→ M , such that
γ(0) = p and γ 0 (0) = v. The exponential map of p, is then defined as expp (v) = γ(1), i.e.
the exponential map is the point reached by the transport of γ.
Lie Theory Definition
Let G be a Lie group1 and g be its Lie algebra2 (see [3] for a formal definition of these terms).
Several definitions of the exponential map are possible. In computer vision, the most natural
is the special case where G is a matrix Lie group. The exponential map is simply defined to
coincide with the matrix exponential series expansion.
exp(X) =
= I + X + X2 + X3 + · · ·
Another definition is possible which is directly equivalent to the differential geometric
definition above. Redefining γ as a one parameter subgroup, with elements determined by a
vector v ∈ g, directly converts between geodesics and groups.
Rigid Body Tracking
In 3D computer vision, the problem of tracking an object in video is typically addressed
by maintaining a transformation for each of the object’s degrees of freedom. The result is
an estimation of the 3D pose with reference to several coordinate frames. Simplification
of the problem is possible when rigid body transformations are considered. The reason for
this is that such transformations preserve the distance between any two points. This allows
tracking an object by a single transformation of its coordinate frame.
It is necessary to prevent reflections in the transformation in order to track a solid object through space. A constraint is imposed to exclude those transformations which preserve
distance but reflect in a particular plane. Therefore, only transformations which preserve orientation are considered. A more formal definition is possible, where-by the Special Euclidean
transformations are required to preserve the norm and cross-product of two vectors.
A pair of Cartesian coordinate systems are then required to specify the position of the
object relative to the camera. The object coordinate frame is relative to a fixed reference
point, known as the world coordinate frame. One may then define a point on the camera,
attach an axis, and track the displacement from the world frame. Equivalently, one may also
allow the object frame to vary keeping the camera frame fixed. The key idea is maintaining
a transformation relating the object and world frames. An illustration of the geometric
structure is presented in Figure 1.
Relating the object and world frames is a form of restricted affine transformation. Specifically, only composed of two components; a rotation and a translation. Next, derivation of
these is handled in turn, with two purposes. (a) Representation in homogeneous coordinates.
(b) Representation as a parametric model.
Informally, a Lie group consists of infinitesimal elements with the property that it is also a differentiable
Again informally, the elements of g compose the tangent space to G at the identity element.
Figure 1: Transformation, g, of a camera frame, C (x,y,z), relative to the world frame, W
(X,Y,Z). (Source [3])
Exponential Representation
The rotational component can be developed independently of the translational part. A
property in common with all rotation matrices is that the transpose is equal to the inverse.
Rwc Rwc
A family of matrices satisfying this property is known as the orthogonal matrices. They
form a group O(3) under the group operation of matrix multiplication. An additional constraint is imposed to limit the group to be only orientation preserving matrices. Specifically,
the requirement is that the determinant equals +1. The terminology is similar, denoted
SO(3), for the special orthogonal matrices.
SO(3) , {R ∈ R3×3 | RRT = I, det(R) = +1}
Although the number of parameters is potentially 3 × 3 = 9, the constraint imposed by
RRT = I implies that only three of these are free, which equals the dimensionality of the
space of rotation matrices. A parametric representation for rotation matrices has therefore
been developed.
A continuous map R(t) : t 7→ SO(3) is defined representing a rotational trajectory
of an object relative to the world frame. The rotational velocity Ṙ(t)RT (t) can then be
represented as a skew-symmetric matrix M ∈ R3×3 . A 3-vector ω is defined containing the
free parameters of the rotation matrix, and ω̂ = M is added to the notation. The tangent
space to SO(3) at the identity element is the space of skew-symmetric matrices, also known
as its Lie Algebra.
so(3) , {ω̂ ∈ R3×3 | ω ∈ R3 }
Now attention shifts to translation matrices in three dimensions, and how these can be
viewed in a parametric form. By extracting the difference terms from the fourth column, a
parametric model of three parameters is formed. A complete rigid body motion is specified
by a translation and a rotation matrix. These can be written together in block from, as
Rwc Twc
This allows complete representation of a rigid body motion. It follows immediately that
the number of parameters is six. Together, the collection of all these matrices is precisely the
group of orientation preserving Euclidean transformations, SE(3), for the Special Euclidean
Rwc Twc
SE(3) , g =
| Rwc ∈ SO(3), Twc ∈ R
Based on the homogeneous transformation, encapsulating the complete rigid body motion, it is possible to represent the position of the object at the next time step in a matrix
exponential form. The tangent space is given by the following Lie algebra
| ω̂ ∈ so(3), v ∈ R
se(3) , ξˆ =
0 0
where v is defined as Ṫ (t) − ω̂(t)T (t). The intuitive explanation for this is the effect on
the translational velocity by applying a rotation. Such a translation is known as a twist.
Together the first row of the matrix ξˆ defines the unique geodesic in the direction of the
tangent vector at g(t), as introduced in §2. The tangent vector to the geodesic is given by
the following ordinary differential equation.
ġ(t) = ξg(t)
The solution of which is given by
g(t) = eξ̂t g(0)
where eω̂t is the matrix exponential, obtained from the exponential series expansion
e =
ˆ n
Hence, assuming R(0) = I, the exponential map can now be defined
ˆ →
exp : se(3) → SE(3); ξt
7 eξ̂t
Bibliographic Notes
The content of §3 was composed following meticulous study of [3], a sample chapter of which
is available from the book’s website. In particular, Figure 1 should be noted to originate
from those authors, together with the notational quirks and the derivation of the exponential
map. Much more detail on the exponential representation is available in the book. A point
of interest is the logarithmic map, which is given by the right inverse of the exponential map.
Applications in motion tracking first appear in [1]. The authors represent the kinematic
chain of humans as a product of exponential maps and produce a differential model of motion,
showing comparable performance to previous methods.
Another application is statistical shape estimation, where the computation of geometric
invariants from image data is performed. Fletcher et al [2] generalised Principal Components
Analysis to operate on the non-Euclidean geometry of Lie groups, and named the method
Principal Geodesic Analysis. In contrast to motion tracking, the authors use the medial
representations as Lie group elements. An algorithm is then given for computing the basis
vectors (which are geodesics on the Lie group).
[1] C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, CVPR ’98, pages 8–, Washington, DC, USA, 1998. IEEE Computer Society.
[2] P. Thomas Fletcher, Conglin Lu, and Sarang Joshi. Statistics of shape via principal
geodesic analysis on lie groups. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 1:95, 2003.
[3] Yi Ma, Stefano Soatto, Jana Kosecka, and S. Shankar Sastry. An Invitation to 3-D
Vision: From Images to Geometric Models. SpringerVerlag, 2003.