Download MODEL-BASED HUMAN EAR IDENTIFICATION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Auditory system wikipedia , lookup

Sound localization wikipedia , lookup

Earplug wikipedia , lookup

Transcript
MODEL-BASED HUMAN EAR IDENTIFICATION
Ernő Jeges, Budapest University of Technology and Economics, Hungary, [email protected]
László Máté, Search-Lab Ltd, Hungary, [email protected]
ABSTRACT
Nowadays the viability of ear biometrics and the uniqueness of ears is beyond question, but
reliable technical solutions have not yet been presented. As opposed to face recognition, in which
a model-based approach is widely used, surprisingly little effort has been put into using ear
models in automatic recognition, even though ear shape is more robust than facial characteristics,
being unaffected by emotional expressions. In this paper we would like to introduce our modelbased scheme for ear feature extraction, implementation of which has proved that the method is
strong enough to be applicable in an identity tracking system.
KEYWORDS: ear biometrics, image processing, model-based approach, active contours
1. INTRODUCTION
Identification – the basis of every access control system – can be accomplished by knowledge(password), possession- (key) and biometric-based methods. Unfortunately, passwords or keys
carry within them an unavoidable weakness: they are only tenuously linked to their owners. In
contrast to this, biometric identification methods directly check the identifiable person, which is
especially useful if we want to do it passively, from distance. For this the ear provides promising
identification features, but although ear prints have already been used as evidence in criminal
cases, up to now little effort has been invested in automating ear-based human identification.
Ear-based identification is a relatively new method among biometric techniques. As opposed
to face identification, which people use in everyday life to recognize their acquaintances, ears can
only be used easily and reliably in automated identification. Although it is generally accepted that
any given person’s ear shape is unique, humans cannot distinguish other people on the basis of
their ears, however, computer algorithms can, as they can recognize, extract and distinguish
between the different distinctive features of ears.
Alfred Iannarelli was the pioneer in using ear features to identify people, developing his
forensic method in 1949. He manually measured the distances between different parts of the ear,
and collected an ‘ear database’ containing more than ten thousand ear images [1].
Iannarelli’s method only allowed for identification within a population of not more than 16.7
million (412). Moreover, his measurement needed a precisely determined base point, which made
his method even harder to be applied in automatic recognition. After Burge and Burger’s
publication [2] on automating ear biometrics in the late 90s – in which they suggested the use of
Voronoi diagrams – a multitude of studies appeared, based on various approaches (e.g. force-field
transformation [3]).
As the model-based approach is widely used in face recognition, it was surprising to discover
that this approach had not yet been applied to ear-based identification, even though the ear
contains more distinctive and robust features. To exploit this robustness, we have chosen to use a
priori knowledge about the ear’s geometry in the form of an ear model, in order to establish a
new approach in automatic ear-based human identification.
2. EAR-BASED HUMAN IDENTIFICATION
In our longer term targeted human identification and identity tracking application [4], ear
identification is integrated into a video surveillance system. Thus the identification process starts
Previous
frames
with capturing picture frames, which presumably contain within the camera’s field of view the
face and an ear of the person to be identified. For this, our framework consists of several
modules; the process starts with capturing the frames (CAMC), which are then processed by
several image processing (IMPR) algorithms in order to remove noise and apply necessary
transformations. Background information is continuously synthesized (BACK) by calculating the
differences between frames, in order to be able to detect and segment the moving shapes on the
camera pictures (SHSG).
After segmenting the moving shapes, we must determine whether an ear is visible on them.
For detection and localization (DELO) of the ear on pictures we can use several methods, such as
neural networks, a model-based approach, or even some combination of these (see [5] for a
description of our localization method). Upon successful localization, at the end of the process we
can extract a detail from the captured image containing only the ear sample, and extract features
from it (FEEX). The architecture of our framework is shown in Figure 1.
Figure 1. The architecture of our ear identification framework
Below we shall introduce our feature extraction algorithm, which relies on the model of the
ear’s geometry, and uses the widely-used active contour [6] (or “snake”) technique to determine
the best fitting model and obtain the model parameters as features of the ear.
The active contour is a method that uses a deformable model of curves to track a shape in
motion. It was initially introduced by Kass, Witkin and Terzopoulos [6], and the main idea is to
define a model for which the pre-defined internal forces are the constraints that ensure its shape
by preventing it being affected by arbitrary deformations, while the external forces are formed
from the pixels of the consecutive images, and attract the shapes. The shape tends to its
equilibrium position (in which internal and external forces are equal), and thus the shape in
motion is continuously tracked by the model. This is especially important, as we use live camera
pictures, on which it is essential to track the ear with the model in real time after it has been
initially localized (see [7] for details of the active contour method).
3. FEATURE EXTRACTION FROM EARS
3.1. Edge detection
In order to determine the external forces that would attract the curves of the active contour
model, first we had to detect the edges of the ear image. Our first thoughts were to follow
previous studies and to proceed with force-field transformations; however, our experiments
showed that a simpler and less computationally demanding filter satisfied the requirements of an
active contour-based method.
To clarify the detected edges (particularly the level sets of the edge filter), we applied several
morphological transformations, using cell automata models with different rules. The result was a
few independent pixel threads (as shown on Figure 2, left), which appeared to be robust enough;
some problems which occurred, such as shadow-generated false threads, had a negligible
influence on further processing.
3.2. Normalizing to the outer ear contour
After the edges had been detected and clarified, we iterated the lines of a standard ear model to
them; however, due to the inaccuracy of the localization it proved more effective and precise to
first roughly iterate the outer contour of the ear model to the bordering edge of the ear, and to
execute precise matching of the inner contour lines afterwards. We can think of the first step as
simply a more precise localization of the ear, on the basis of which the rest of the ear image is
normalized.
To determine the forces used for the iteration of the outer ear contour, we simply used the
pixel threads determined by the edge detection method described above, while also taking into
account the tangential properties of these threads. This meant that pixels in a thread representing
an edge exerted a greater attraction force on the contour line segment if that thread was parallel to
the segment; in fact the force was proportional to the absolute value of the scalar product of the
two tangentials – that of the model’s closest section and that of the pixel thread – as can be seen
from the equation below:
r
r r 2
Fext = (t ac ⋅ t pt ) ,
r
r
where t ac is the normalized tangent vector of the active contour at an arbitrary point, and t pt
is the tangent vector of a pixel thread in the vicinity of that point.
Figure 2. The pixel threads with the iteration of the outer edge of the ear (left)
and the normalized ear edges projected onto each other (right)
When the outer contour of the ear model reached its equilibrium position (see the darker
smooth line in Figure 2, left), this border defined a space in which we carried out the fine-tuned
matching of the inner model. This was actually normalization to the initial model, as instead of
the original coordinates on the ear image, we hereafter used transformed points, for which the
transformation is defined by the position and shape of the determined outer edge.
To examine the behavior of the thus normalized ear edges, we projected them onto each other,
forming an average normalized ear (Figure 2, right). It was plainly observable that inside the clear
and bright ear border we had certain remarkably clear areas which – though characteristic edges
on every ear – seemed to vary enough to be used as features.
3.3. Iterating the active contour model
As the bright areas on the average normalized ear image shown above suggested, the border
contour and the three loose inner edges represented the curves of our compound ear model, as
shown in Figure 3 (left). The interrelation of these edges was expressed in the form of the internal
forces of the model, but – as we shall see – these forces were chosen to be weaker for the relative
positions of these curves and stronger for the shape of any individual curve.
Figure 3. Our common ear model (left) and the iteration of the
active contours to the actual ear image (right)
To eliminate the external forces originating from pixel threads belonging to other curves, we
classified the detected edge pixels (pixel threads) according to the nearest active contour model
curve. For this we defined zones on the normalized ear image, and the pixels attracting the active
contour were classified according to the zone that the majority of their thread fell within; thus
every pixel in a pixel thread had its own corresponding curve.
As when using the active contour method, at each iteration of the model we were dealing both
with internal and external forces. In the first step – dealing with external forces – we selected
random pixel points uniformly distributed on an active contour curve, and calculated the external
forces in the vicinity of those positions. To calculate the forces attracting a curve we calculated
only the attraction of corresponding pixel threads; moreover, similarly to the handling of the outer
contour, we took into account the scalar product of the tangential of the pixel threads and the
tangential of the active contour section.
As a second step, the internal forces that drove the active contour toward its original shape
were determined to be inversely proportional to the difference between the initial and the actual
positions of the segmentation points. We used different rigidity factors for the four separate
contour types, and a much smaller coefficient for the model as a whole; thus the original shapes
of the separate curves were likely to be preserved, while their relative positions could vary more.
Figure 3 (right) shows an active contour model in its final state, in which the internal and
external forces are in the equilibrium position; the model fits the underlying image, while the
original shapes of the curves are roughly preserved.
3.4. Feature extraction
At this point we had an active contour that fitted the underlying ear image. The last step was to
analyze this model and collect feature values from it, in order to form a feature vector which
could be used for ear-based human identification. Basically we defined two sets of features.
The first set of features was derived from the distortion of the model related to the original ear
model, expressed as the difference between the original and the final (reposed) state of the active
contour on an ear image. This difference was on the one hand derived by determining the distance
of the reposed curve from the original curve. This was done by measuring the distance of the
segmentation points of the original model curve and the points where their perpendicular at this
point cut the reposed curve (denoted by the thicker line in Figure 4, A). On the other hand, the
distance of the reposed curve’s segmentation points from the underlying pixel thread was
measured similarly. As we had a total of thirty-one segmentation points on four active contour
curves, this feature set produced 31 x 2 = 62 feature values.
The rest of the features were derived using three adequately selected axes of measurement. For
each axis we had previously chosen some featured points on the four curves, numerically 8, 6 and
7. For a reposed active contour we measured the projection of these points on the appropriate axis
(P1), and the feature was formed from the distances measured between these projection points and
a defined point, i.e. the intersection of the three axes (marked f1 on Figure 4, B). In this way we
obtained 8 + 6 + 7 = 21 feature values, which – due to appropriate selection of the axes of
measurement – appeared to be independent of the angle from which the ear image was taken.
fd1
A)
s1
Pseg1,2
s2
B)
Figure 4. The distortion (A) and direction (B) derived features
and the three axes of measurement, with the points to be projected (right)
3.5. CONCLUSIONS
To carry out full testing of commonly used false acceptance rate (FAR) and false rejection rate
(FRR) values, we needed to define an acceptance criterion. This is the process of making a binary
decision on whether two feature vectors are from the same person or not. For this, we defined a
distance function between the coordinates of the feature vectors, and a distance threshold.
The proposed model-based human ear identification method was tested on approximately
24,000 images (motion picture frames), taken from live camera pictures of twenty-eight different
people. For each person we had several video sequences with their head visible, so the pictures
were assigned to subjects, making the comparison testing of acceptance and rejection possible.
The picture frames were processed one by one, after the ear had been detected on each, as
described in [5]; altogether we localized 3,531 ears on these images, and thus the evaluation of
the feature extraction algorithm was carried out with this number of images.
The FAR and FRR results of the comparison tests are shown in Figure 5 below, in which the
vertical axis shows the error rate as a percentage and the horizontal axis shows the threshold for
the feature vector distances used in determining acceptance.
The separate testing of different introduced features proved that the feature set of three axes of
measurement had alone produced a more stable feature vector – especially due to its being
unaffected by the angle of view – but the overall quality of the system was improved by
expanding the feature vector with the features derived purely from distortion of the model. The
equal error rate (EER) value (error rate where FAR equals FRR) was below 10% (7.6%), which
clearly shows that the method is applicable in our planned identity tracking system [4], although
it is not yet usable as a single authentication measure if a high level of security is required.
60
%
FRR
FAR
50
40
30
20
10
0
Threshold
Figure 5. The error graph showing the FAR and FRR values
of our model-based human ear identification method
For future improvement of our ear-based human identification method we plan to expand the
feature vector with further feature values derived from the model, primarily by determining and
choosing features unaffected by angle of view. Some attractive alternatives are the localization of
some sharply curved breaks on ear edges, the determination of the width of pixel threads, and the
detection of hair or some special features such as ear-rings, etc.
3.6. ACKNOWLEDGEMENTS
The project is being realized with the financial support of the Information and Communication
Technologies and Applications thematic program (IKTA) of the Hungarian National Technical
Development Council.
3.7. REFERENCES
[1] Alfred Iannarelli, Ear Identification, Forensic Identification Series, Paramont Publishing
Company, Fremont, California, 1989;
[2] Mark Burge, Wilhelm Burger, Ear Biometrics, 1998;
http://www.computing.armstrong.edu/FacNStaff/burge/pdf/burge-burger-us.pdf
[3] Hurley, D. J., Nixon, M. S. and Carter, J. N. (2005) Force field feature extraction for ear
biometrics. Computer Vision and Image Understanding: pp. 491-512;
http://eprints.ecs.soton.ac.uk/10242/01/hurley_cviu.pdf
[4] Integrált Biometrikus Azonosító Rendszerek, NKFP 2/030/04, project pages available in
Hungarian;
http://www.mit.bme.hu/projects/ibar04.html
[5] László Máté, Localizing Feature Points on Ear Images (2005), HACIPPR, Veszprém, 2005,
ISBN 3-85403-192-0: pp 57-63;
[6] M. Kass, A. Witkin, D. Terzopoulos, Snakes: Active contour models, International Journal of
Computer Vision, 1988 (1987), Volume 4, pp 321-331;
http://mrl.nyu.edu/~dt/papers/ijcv88/ijcv88.pdf
[7] Andrew Blake and Michael Isard, Active Contours, Springer, 1998;
http://www.robots.ox.ac.uk/~contours/