Download A Repeatability Test for Two Orientation Based Interest Point Detectors Bj¨

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hubble Deep Field wikipedia , lookup

Astrophotography wikipedia , lookup

Transcript
A Repeatability Test for Two Orientation Based
Interest Point Detectors
Björn Johansson and Robert Söderberg
April 26, 2004
Technical report LiTH-ISY-R-2606
ISSN 1400-3902
Computer Vision Laboratory
Department of Electrical Engineering
Linköping University, SE-581 83 Linköping, Sweden
[email protected], [email protected]
Abstract
This report evaluates the stability of two image interest point detectors, starpattern points and points based on the fourth order tensor. The Harris operator
is also included for comparison. Different image transformations are applied and
the repeatability of points between a reference image and each of the transformed
images are computed. The transforms are plane rotation, change in scale, change
in view, and change in lightning conditions.
We conclude that the result largely depends on the image content. The starpattern points and the fourth order tensor models the image as locally straight lines,
while the Harris operator is based on simple/non-simple signals. The two methods
evaluated here perform equally well or better than the Harris operator if the model
is valid, and perform worse otherwise.
1
Contents
1 Introduction
2 Interest point detectors
2.1 Harris, nms . . . . . .
2.2 Harris, subpixel . . . .
2.3 Fourth order tensors .
2.4 Star patterns . . . . .
3
.
.
.
.
3
3
4
4
5
3 Experimental setup
3.1 Repeatability criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Transformation of scale, rotation and view . . . . . . . . . . . . . . . . . .
3.3 Transformation of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
6
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Experimental results
11
4.1 Comparison of the two Harris versions . . . . . . . . . . . . . . . . . . . . 11
4.2 Rotation, Scale, and View . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Variation of illumination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Conclusions and discussion
12
2
1
Introduction
In the last few years, a number of experiments has been performed to evaluate the stability
for interest point detectors and local descriptors, see e.g. [11, 8]. Stable interest points
are useful for example in object recognition applications, see e.g. [6, 7, 5], where a local
image content descriptor is computed in every interest point. These descriptors can be
used in a feature matching algorithm to find the objects and object poses. The need for
stable points may not be crucial but depends on what you do with them, e.g. the choice
of descriptor. However, in general, stable points should make the system more robust.
Among the number of detectors evaluated in Schmid et. al. [11] it was found that the
Harris operator was the most stable one. This report will evaluate two other methods
for detection of interest points and compare with the Harris operator. The experiments
and philosophy are very much the same as in [11], and we refer to this reference for more
details. Basically, different image transformations are computed and the repeatability of
points between a reference image and each of the transformed images are computed. The
transformations are in [11] generated by actually moving the camera. The homographies
are estimated by a registration of the images using a pattern that is projected onto the
image by a projector. We use a simpler approach in this paper; the transformations are
simulated in the computer. This approach has a drawback in that it introduces interpolation noise, and that the simple camera model may be unrealistic. On the other hand
we do not have to estimate the homography.
We will first describe the methods that are evaluated in this report. Then we explain
some details of the experiment setup and finally present the results.
2
Interest point detectors
This section gives a short description of the methods included in the evaluation. Two
versions of the Harris operator are considered. The parameters and thresholds for the
methods are chosen such that they all are based on fairly the same region size and that
they give approximately the same amount of points.
2.1
Harris, nms
The Harris function is computed as
H = det(T) − α trace(T) ,
where α = 0.04 and T is the structure tensor
Z
T = g(x)∇I∇I T dx .
(1)
(2)
g is a Gaussian window function with σ = 2. The image gradient is computed using differentiated Gaussian filters with σ = 1. Local maxima points are found by non-maximum
suppression. All filters can be made separable.
3
Figure 1: Local features detected by the tensor from left to right: crossing, T-crossing,
corner, non-parallel lines and parallel lines.
2.2
Harris, subpixel
Same as the previous one, except that the local maxima points are found with subpixel
accuracy. A second order polynomial is fitted to the Harris image around each of the
maxima pixels in the previous method, and the local maxima of the polynomial function
gives the subpixel position.
2.3
Fourth order tensors
This method uses the tensor representation explained in [9]. The tensor can represent
one, two, or three line segments. If the tensor is reshaped to a matrix in a proper way the
number of line segments will correspond to the rank of the tensor. For example, a tensor
representing a corner will have rank two. By using this property the tensor representation
can be used as an interest point detector, where the interest points is the local features
illustrated in figure 1. The detection process is explained in detail in [10], but the basics
are:
1. Compute the image gradient, where a differentiated Gaussian filter with σ = 1 is
used (same as for the Harris methods).
2. The image gradient is improved to suit the tensor representation by using a method
described in [4], where the response from edges and lines are amplified and made
more concentrated.
3. An orientation tensor T is estimated from the improved image gradient.
4. The fourth order tensor is estimated by applying a number of separable filters on
the elements in the orientation tensor. These filters represents a subset of monomes
up to the fourth order.
5. The fourth order tensor is reshaped to a matrix and a measurement, c2 , for rank
two is calculated:

 t = σ1 + σ2 + σ3
−9d + qt
d = σ1 σ2 σ3
c2 =
, where

3d − 3qt + t3
q = σ1 σ2 + σ2 σ3 + σ3 σ1
and where σi represent the matrix three largest singular values.
6. A selection of interesting tensors is performed by picking each tensor corresponding
to a local maxima in the rank two image weighted with the tensor norm. The interest
point position is then calculated as the crossing between the two line segments.
4
2.4
Star patterns
The method we use to find star-patterns is a combination of the ideas in [2, 3, 1]. This
method is explained in detail in [5, 4]. The basics are:
1. Compute the image gradient ∇I, A differentiated Gaussian filter with σ = 1 is used
(same as for the Harris methods).
2. Star patterns are found as local maxima to the function
Z
Sstar = g(x) h∇I, x⊥ i2 dx .
(3)
g is a Gaussian window function with σ = 2. Sstar is made more selective by
inhibition with a measure for simple signals. Local maxima points are then found
by non-maximum suppression.
3. The point positions are improved by minimizing the circle pattern function
Z
Scircle (p) = g(x) h∇I, x − pi2 dx
(4)
that are computed around each of the local maxima points.
The algorithm needs to compute a subset of monomes (or derivatives) up to the second
order on the three images Ix2 , Iy2 , and Ix Iy . All filters can be made separable.
3
3.1
Experimental setup
Repeatability criterion
The repeatability criterion is the same as in [11]. We give a short summary here. Let Ir
denote the reference image and let Ii denote an image that has been transformed. Let
{xr } denote the interest points in reference image Ir , and let {xi } denote the interest
points in the transformed image i. For two corresponding points in xr and xi in image Ir
and Ii we have
xi = Hri xr ,
(5)
where Hri denotes the homography between the two images (the points is here represented
in homogeneous coordinates). As in [11] we remove the points that do not lie in the
common scene part of images Ir and Ii . Let Ri () denote the set of corresponding points
pairs within -distance, i.e.
Ri () = {(xr , xi ) | dist(Hri xr , xi ) < } .
(6)
The repeatability rate for image Ii is defined as
ri () =
|Ri ()|
,
min(nr , ni )
(7)
where nr = |{xr }| and ni = |{xi }| are the number of points detected in the common part
of the two images. Note that 0 ≤ ri ≤ 1.
5
3.2
Transformation of scale, rotation and view
The homographies for the rotation, scale, and view transformations can be found in
many text books, but we still include a short derivation for sake of completeness. The
T
relation between a point X = X Y Z in the 3D world and the corresponding point
T
x = x y in the image is
x
X
λ
=P
,
(8)
1
1
where P = K [R|t] is the camera (projection) matrix. The matrix K contains the camera
parameters. We assume the simple model
f I x0
,
(9)
K=
0 1
where f is the focal length and x0 is the origin for the image coordinate system. The
matrix R and the vector t defines the transformation of the 3D coordinate system. For
the reference image we assume that the optical axis of the camera is orthogonal to the
image in the 3D world and that the distance between the camera and the image is d. This
gives R = I and t = 0, and from (8) we then get
−1 xr
.
(10)
X = dK
1
We now use (8) and (10) to compute a relation between the point xr in the reference
image and a corresponding point x in another image taken with the camera in a different
position, i.e. for general choice of R and t. The relation becomes
x
X
λ
= K [R|t]
1
1
= KRX + Kt
−1 xr
= KRdK
+ Kt
1
xr
−1
,
= dKRK + [0|Kt]
1
(11)
and we identify the general homography between the reference image and a transformed
image as
H = dKRK−1 + [0|Kt] .
(12)
We get the following homographies for the special cases of rotation, scale, and view:
• Plane rotation:
H = KRK−1
,


cos ϕ − sin ϕ 0
where R =  sin ϕ cos ϕ 0 .
0
0
1
6
(13)
• Scale change:
H = dI + [0|Kt]
,
 
0

where t = 0  .
d0
• Viewpoint change: Equation (12) where


1
0
0
R = 0 cos ϕ − sin ϕ
0 sin ϕ cos ϕ
,
(14)
 
0

t = [I − R] 0 .
1
(15)
The list below contains data for the experiments:
• Number of test images: 6, see figure 2.
• Plane rotation: 18 images evenly spread between 0◦ and 180◦ (ϕ =
0, 1, . . . , N − 1, N = 18). The first image is used as reference.
πk
,
N −1
k =
• Scale change: 9 images with a scale change (non-evenly spread) up to three times
2
the original size (d0 = 1c − 1, where c = 2 Nk−1 + 1, k = 0, . . . , N − 1, N = 9). The
first image is used as reference.
• View change: 21 images with a change in view between −45◦ and 45◦ (ϕ =
k = −N, . . . , N , N = 10). The middle image is used as reference.
πk
,
4N
• Two choices of is used, = 0.5 and = 1.5.
Figure 2 shows the test images that is used for the rotation, scale, and view transformations. They range from real images to synthetical images. Figure 3 shows an example
of each of the transforms for one of the test images. The images has been expanded by
zero padding before the transformation. This helps to avoid loss of points in the transformations (note however, that the padding is not enough for the scale transformation).
3.3
Transformation of light
The test images for the light transformations are however not simulated. These images
are taken of 3D scenes by a stationary camera, and either the camera shutter or the light
source is changed. Three sequences were taken and shown in figure 4. In the first sequence
we change the camera shutter, and the middle image is used as reference. The last two
sequences are taken by changing the light source position, and the first image is used as
reference in both cases. The scene is not planar and we therefore do not really have a
ground truth in the last two cases. But we believe that the evaluation is still relevant
since similar situations appear for example in object recognition applications, where the
training data for an object is taken with different lightning conditions than the query
data, see e.g. [5].
7
Toy car
Aerial image
Corner test image
50
50
100
50
150
100
200
150
250
200
300
250
350
100
150
200
250
100
200
100
300
Toy monestary
200
300
50
Picasso painting
100
150
200
250
Semi-synthetic room
50
100
50
50
150
100
200
100
150
250
150
200
300
250
350
200
100
200
300
50 100 150 200 250
50
100
150
200
Figure 2: Test images for the rotation, scale, and view transformations.
8
250
Rotation
Frame 1
Frame 7
Frame 13
Frame 18
1
512
1
512
Scale
Frame 1
Frame 3
Frame 5
Frame 7
Frame 9
Frame 16
Frame 21
1
512
1
512
View
Frame 1
Frame 6
Frame 11
1
512
1
512
Figure 3: An example of the transformations rotation, scale, and view.
9
Change in camera shutter
Frame 1
Frame 4
Frame 7
Frame 10
Frame 13
1
428
1
572
Frame 1
Frame 2
Change of light source position
Frame 3
Frame 4
Frame 5
Frame 6
Frame 7
Frame 2
Change of light source position
Frame 3
Frame 4
Frame 5
Frame 6
Frame 7
1
428
1
572
Frame 1
1
428
1
572
Figure 4: Test sequences for the light transformations. The first sequence contains 13
frames and the last two sequences contain 7 frames each.
10
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris nms
harris subpix
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
150
0
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 5: Average results of the two Harris versions on the test images in figure 2 for the
rotation, scale, and view transformations.
4
Experimental results
We will show the average results for all the test images. But we will also show the
individual results for each test image to show that the result differs depending on the
type of test image.
4.1
Comparison of the two Harris versions
From the results it was found that Harris with subpixel accuracy performed overall much
better than Harris using only non-max suppression, figure 5 shows one example. The
difference is most obvious for smaller than 1 pixel, as would be expected. Because of
this result we will only include the subpixel Harris from now on, to make the presentation
less messy.
4.2
Rotation, Scale, and View
Figure 6 contain the average results for all methods except Harris nms. The results
are somewhat inconclusive, but if we examine each test image separately we see that
subpixel-Harris performs best for natural images. The star-patterns and the four order
tensors perform equally well or better than Harris on images that better resembles their
11
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
150
0
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 6: Average results on the test images in figure 2 for the rotation, scale, and view
transformations.
models, i.e. straight lines and sharper corner points, especially for the scale transformation
(figures 9 and 12).
4.3
Variation of illumination
The results on the light transformation sequences is shown in figure 13. We conclude that
the differences are not significant between the different methods.
5
Conclusions and discussion
From the experiments we may conclude that the choice of operator depends on the image content. The star-pattern method and the fourth order tensor assumes a bit more
advanced model than the Harris operator. These methods has the best performance if
their corresponding models can describe the image content, as would be expected. If the
model is less valid it seems that it is better to use a more crude model as in the Harris
operator.
The operators described here is intended to be used in object recognition tasks. Other
interest point operators have been used in this application, one of the most successful
methods in recent years is to find local maxima in DoG(Difference of Gaussians) scale
space, see e.g. [6, 7]. These points were also evaluated (using the implementation by
12
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
0
150
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
3
100
0
150
20
40
Figure 7: Result on test image 1 for the rotation, scale, and view transformations.
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
150
0
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 8: Result on test image 2 for the rotation, scale, and view transformations.
13
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
0
150
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 9: Result on test image 3 for the rotation, scale, and view transformations.
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
150
0
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 10: Result on test image 4 for the rotation, scale, and view transformations.
14
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
0
3
1
1
0.5
0.5
0
0
50
100
view
0
150
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 11: Result on test image 5 for the rotation, scale, and view transformations.
scale, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
1
2
rotation
3
0
1
1
0.5
0.5
0
0
50
100
view
150
0
50
0
−40
−20
1
0.5
0.5
−20
0
20
40
2
0
1
0
−40
1
100
0
3
150
20
40
Figure 12: Result on test image 6 for the rotation, scale, and view transformations.
15
Test sequence 1, ε = 0.5
ε = 1.5
1
1
0.5
0.5
harris subpix
tensor
star patterns
0
0.5
1
1.5
Test sequence 2, ε = 0.5
0
1
1
0.5
0.5
0
2
4
6
Test sequence 3, ε = 0.5
0
1
1
0.5
0.5
0
2
4
6
0
0.5
1
ε = 1.5
1.5
2
4
ε = 1.5
6
2
4
6
Figure 13: Results on the light change sequences in figure 4.
16
Lowe) on the test images in this report, but the result were poor. However, it is not fair
to evaluate the stability for an operator that is computed in several scales. The larger
scales may be less stable, but that may not matter if your descriptor is computed in a
region that is proportional to the scale of the interest points.
Acknowledgments
We gratefully acknowledge the support from the Swedish Research Council through a
grant for the project A New Structure for Signal Processing and Learning, and the European Commission through the VISATEC project IST-2001-34220 [12] .
References
[1] J. Bigün. Pattern recognition in images by symmetries and coordinate transformations. Computer Vision and Image Understanding, 68(3):290–307, 1997.
[2] W. Förstner. A framework for low level feature extraction. In Proceedings of the third
European Conference on Computer Vision, volume II, pages 383–394, Stockholm,
Sweden, May 1994.
[3] Björn Johansson. Multiscale curvature detection in computer vision. Lic. Thesis
LiU-Tek-Lic-2001:14, Dept. EE, Linköping University, SE-581 83 Linköping, Sweden,
March 2001. Thesis No. 877, ISBN 91-7219-999-7.
[4] Björn Johansson and Anders Moe. Patch-duplets for object recognition and pose
estimation. Technical Report LiTH-ISY-R-2553, Dept. EE, Linköping University,
SE-581 83 Linköping, Sweden, November 2003.
[5] Björn Johansson and Anders Moe. Patch-duplets for object recognition and pose
estimation. In Ewert Bengtsson and Mats Eriksson, editors, Proceedings SSBA’04
Symposium on Image Analysis, pages 78–81, Uppsala, March 2004. SSBA.
[6] David G. Lowe. Object recognition from local scale-invariant features. In Proc.
ICCV’99, 1999.
[7] David G. Lowe. Local feature view clustering for 3D object recognition. In Proc.
CVPR’01, 2001.
[8] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. In
IEEE Conference on Computer Vision and Pattern Recognition, pages 257–263, June
2003.
[9] K. Nordberg. A fourth order tensor for representation of orientation and position of
oriented segments. Technical Report LiTH-ISY-R-2587, Dept. EE, Linköping University, SE-581 83 Linköping, Sweden, Februari 2004.
17
[10] Klas Nordberg and Robert Söderberg. Detection and estimation of features for estimation of position. In Ewert Bengtsson and Mats Eriksson, editors, Proceedings
SSBA’04 Symposium on Image Analysis, pages 74–77, Uppsala, March 2004. SSBA.
[11] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interest point detectors. Int.
Journal of Computer Vision, 37(2):151–172, 2000.
[12] URL: http://www.visatec.info.
18