Download MODEL-BASED HUMAN EAR IDENTIFICATION

MODEL-BASED HUMAN EAR IDENTIFICATION Ernő Jeges, Budapest University of Technology and Economics, Hungary, [email protected] László Máté, Search-Lab Ltd, Hungary, [email protected] ABSTRACT Nowadays the viability of ear biometrics and the uniqueness of ears is beyond question, but reliable technical solutions have not yet been presented. As opposed to face recognition, in which a model-based approach is widely used, surprisingly little effort has been put into using ear models in automatic recognition, even though ear shape is more robust than facial characteristics, being unaffected by emotional expressions. In this paper we would like to introduce our modelbased scheme for ear feature extraction, implementation of which has proved that the method is strong enough to be applicable in an identity tracking system. KEYWORDS: ear biometrics, image processing, model-based approach, active contours 1. INTRODUCTION Identification – the basis of every access control system – can be accomplished by knowledge(password), possession- (key) and biometric-based methods. Unfortunately, passwords or keys carry within them an unavoidable weakness: they are only tenuously linked to their owners. In contrast to this, biometric identification methods directly check the identifiable person, which is especially useful if we want to do it passively, from distance. For this the ear provides promising identification features, but although ear prints have already been used as evidence in criminal cases, up to now little effort has been invested in automating ear-based human identification. Ear-based identification is a relatively new method among biometric techniques. As opposed to face identification, which people use in everyday life to recognize their acquaintances, ears can only be used easily and reliably in automated identification. Although it is generally accepted that any given person’s ear shape is unique, humans cannot distinguish other people on the basis of their ears, however, computer algorithms can, as they can recognize, extract and distinguish between the different distinctive features of ears. Alfred Iannarelli was the pioneer in using ear features to identify people, developing his forensic method in 1949. He manually measured the distances between different parts of the ear, and collected an ‘ear database’ containing more than ten thousand ear images [1]. Iannarelli’s method only allowed for identification within a population of not more than 16.7 million (412). Moreover, his measurement needed a precisely determined base point, which made his method even harder to be applied in automatic recognition. After Burge and Burger’s publication [2] on automating ear biometrics in the late 90s – in which they suggested the use of Voronoi diagrams – a multitude of studies appeared, based on various approaches (e.g. force-field transformation [3]). As the model-based approach is widely used in face recognition, it was surprising to discover that this approach had not yet been applied to ear-based identification, even though the ear contains more distinctive and robust features. To exploit this robustness, we have chosen to use a priori knowledge about the ear’s geometry in the form of an ear model, in order to establish a new approach in automatic ear-based human identification. 2. EAR-BASED HUMAN IDENTIFICATION In our longer term targeted human identification and identity tracking application [4], ear identification is integrated into a video surveillance system. Thus the identification process starts Previous frames with capturing picture frames, which presumably contain within the camera’s field of view the face and an ear of the person to be identified. For this, our framework consists of several modules; the process starts with capturing the frames (CAMC), which are then processed by several image processing (IMPR) algorithms in order to remove noise and apply necessary transformations. Background information is continuously synthesized (BACK) by calculating the differences between frames, in order to be able to detect and segment the moving shapes on the camera pictures (SHSG). After segmenting the moving shapes, we must determine whether an ear is visible on them. For detection and localization (DELO) of the ear on pictures we can use several methods, such as neural networks, a model-based approach, or even some combination of these (see [5] for a description of our localization method). Upon successful localization, at the end of the process we can extract a detail from the captured image containing only the ear sample, and extract features from it (FEEX). The architecture of our framework is shown in Figure 1. Figure 1. The architecture of our ear identification framework Below we shall introduce our feature extraction algorithm, which relies on the model of the ear’s geometry, and uses the widely-used active contour [6] (or “snake”) technique to determine the best fitting model and obtain the model parameters as features of the ear. The active contour is a method that uses a deformable model of curves to track a shape in motion. It was initially introduced by Kass, Witkin and Terzopoulos [6], and the main idea is to define a model for which the pre-defined internal forces are the constraints that ensure its shape by preventing it being affected by arbitrary deformations, while the external forces are formed from the pixels of the consecutive images, and attract the shapes. The shape tends to its equilibrium position (in which internal and external forces are equal), and thus the shape in motion is continuously tracked by the model. This is especially important, as we use live camera pictures, on which it is essential to track the ear with the model in real time after it has been initially localized (see [7] for details of the active contour method). 3. FEATURE EXTRACTION FROM EARS 3.1. Edge detection In order to determine the external forces that would attract the curves of the active contour model, first we had to detect the edges of the ear image. Our first thoughts were to follow previous studies and to proceed with force-field transformations; however, our experiments showed that a simpler and less computationally demanding filter satisfied the requirements of an active contour-based method. To clarify the detected edges (particularly the level sets of the edge filter), we applied several morphological transformations, using cell automata models with different rules. The result was a few independent pixel threads (as shown on Figure 2, left), which appeared to be robust enough; some problems which occurred, such as shadow-generated false threads, had a negligible influence on further processing. 3.2. Normalizing to the outer ear contour After the edges had been detected and clarified, we iterated the lines of a standard ear model to them; however, due to the inaccuracy of the localization it proved more effective and precise to first roughly iterate the outer contour of the ear model to the bordering edge of the ear, and to execute precise matching of the inner contour lines afterwards. We can think of the first step as simply a more precise localization of the ear, on the basis of which the rest of the ear image is normalized. To determine the forces used for the iteration of the outer ear contour, we simply used the pixel threads determined by the edge detection method described above, while also taking into account the tangential properties of these threads. This meant that pixels in a thread representing an edge exerted a greater attraction force on the contour line segment if that thread was parallel to the segment; in fact the force was proportional to the absolute value of the scalar product of the two tangentials – that of the model’s closest section and that of the pixel thread – as can be seen from the equation below: r r r 2 Fext = (t ac ⋅ t pt ) , r r where t ac is the normalized tangent vector of the active contour at an arbitrary point, and t pt is the tangent vector of a pixel thread in the vicinity of that point. Figure 2. The pixel threads with the iteration of the outer edge of the ear (left) and the normalized ear edges projected onto each other (right) When the outer contour of the ear model reached its equilibrium position (see the darker smooth line in Figure 2, left), this border defined a space in which we carried out the fine-tuned matching of the inner model. This was actually normalization to the initial model, as instead of the original coordinates on the ear image, we hereafter used transformed points, for which the transformation is defined by the position and shape of the determined outer edge. To examine the behavior of the thus normalized ear edges, we projected them onto each other, forming an average normalized ear (Figure 2, right). It was plainly observable that inside the clear and bright ear border we had certain remarkably clear areas which – though characteristic edges on every ear – seemed to vary enough to be used as features. 3.3. Iterating the active contour model As the bright areas on the average normalized ear image shown above suggested, the border contour and the three loose inner edges represented the curves of our compound ear model, as shown in Figure 3 (left). The interrelation of these edges was expressed in the form of the internal forces of the model, but – as we shall see – these forces were chosen to be weaker for the relative positions of these curves and stronger for the shape of any individual curve. Figure 3. Our common ear model (left) and the iteration of the active contours to the actual ear image (right) To eliminate the external forces originating from pixel threads belonging to other curves, we classified the detected edge pixels (pixel threads) according to the nearest active contour model curve. For this we defined zones on the normalized ear image, and the pixels attracting the active contour were classified according to the zone that the majority of their thread fell within; thus every pixel in a pixel thread had its own corresponding curve. As when using the active contour method, at each iteration of the model we were dealing both with internal and external forces. In the first step – dealing with external forces – we selected random pixel points uniformly distributed on an active contour curve, and calculated the external forces in the vicinity of those positions. To calculate the forces attracting a curve we calculated only the attraction of corresponding pixel threads; moreover, similarly to the handling of the outer contour, we took into account the scalar product of the tangential of the pixel threads and the tangential of the active contour section. As a second step, the internal forces that drove the active contour toward its original shape were determined to be inversely proportional to the difference between the initial and the actual positions of the segmentation points. We used different rigidity factors for the four separate contour types, and a much smaller coefficient for the model as a whole; thus the original shapes of the separate curves were likely to be preserved, while their relative positions could vary more. Figure 3 (right) shows an active contour model in its final state, in which the internal and external forces are in the equilibrium position; the model fits the underlying image, while the original shapes of the curves are roughly preserved. 3.4. Feature extraction At this point we had an active contour that fitted the underlying ear image. The last step was to analyze this model and collect feature values from it, in order to form a feature vector which could be used for ear-based human identification. Basically we defined two sets of features. The first set of features was derived from the distortion of the model related to the original ear model, expressed as the difference between the original and the final (reposed) state of the active contour on an ear image. This difference was on the one hand derived by determining the distance of the reposed curve from the original curve. This was done by measuring the distance of the segmentation points of the original model curve and the points where their perpendicular at this point cut the reposed curve (denoted by the thicker line in Figure 4, A). On the other hand, the distance of the reposed curve’s segmentation points from the underlying pixel thread was measured similarly. As we had a total of thirty-one segmentation points on four active contour curves, this feature set produced 31 x 2 = 62 feature values. The rest of the features were derived using three adequately selected axes of measurement. For each axis we had previously chosen some featured points on the four curves, numerically 8, 6 and 7. For a reposed active contour we measured the projection of these points on the appropriate axis (P1), and the feature was formed from the distances measured between these projection points and a defined point, i.e. the intersection of the three axes (marked f1 on Figure 4, B). In this way we obtained 8 + 6 + 7 = 21 feature values, which – due to appropriate selection of the axes of measurement – appeared to be independent of the angle from which the ear image was taken. fd1 A) s1 Pseg1,2 s2 B) Figure 4. The distortion (A) and direction (B) derived features and the three axes of measurement, with the points to be projected (right) 3.5. CONCLUSIONS To carry out full testing of commonly used false acceptance rate (FAR) and false rejection rate (FRR) values, we needed to define an acceptance criterion. This is the process of making a binary decision on whether two feature vectors are from the same person or not. For this, we defined a distance function between the coordinates of the feature vectors, and a distance threshold. The proposed model-based human ear identification method was tested on approximately 24,000 images (motion picture frames), taken from live camera pictures of twenty-eight different people. For each person we had several video sequences with their head visible, so the pictures were assigned to subjects, making the comparison testing of acceptance and rejection possible. The picture frames were processed one by one, after the ear had been detected on each, as described in [5]; altogether we localized 3,531 ears on these images, and thus the evaluation of the feature extraction algorithm was carried out with this number of images. The FAR and FRR results of the comparison tests are shown in Figure 5 below, in which the vertical axis shows the error rate as a percentage and the horizontal axis shows the threshold for the feature vector distances used in determining acceptance. The separate testing of different introduced features proved that the feature set of three axes of measurement had alone produced a more stable feature vector – especially due to its being unaffected by the angle of view – but the overall quality of the system was improved by expanding the feature vector with the features derived purely from distortion of the model. The equal error rate (EER) value (error rate where FAR equals FRR) was below 10% (7.6%), which clearly shows that the method is applicable in our planned identity tracking system [4], although it is not yet usable as a single authentication measure if a high level of security is required. 60 % FRR FAR 50 40 30 20 10 0 Threshold Figure 5. The error graph showing the FAR and FRR values of our model-based human ear identification method For future improvement of our ear-based human identification method we plan to expand the feature vector with further feature values derived from the model, primarily by determining and choosing features unaffected by angle of view. Some attractive alternatives are the localization of some sharply curved breaks on ear edges, the determination of the width of pixel threads, and the detection of hair or some special features such as ear-rings, etc. 3.6. ACKNOWLEDGEMENTS The project is being realized with the financial support of the Information and Communication Technologies and Applications thematic program (IKTA) of the Hungarian National Technical Development Council. 3.7. REFERENCES [1] Alfred Iannarelli, Ear Identification, Forensic Identification Series, Paramont Publishing Company, Fremont, California, 1989; [2] Mark Burge, Wilhelm Burger, Ear Biometrics, 1998; http://www.computing.armstrong.edu/FacNStaff/burge/pdf/burge-burger-us.pdf [3] Hurley, D. J., Nixon, M. S. and Carter, J. N. (2005) Force field feature extraction for ear biometrics. Computer Vision and Image Understanding: pp. 491-512; http://eprints.ecs.soton.ac.uk/10242/01/hurley_cviu.pdf [4] Integrált Biometrikus Azonosító Rendszerek, NKFP 2/030/04, project pages available in Hungarian; http://www.mit.bme.hu/projects/ibar04.html [5] László Máté, Localizing Feature Points on Ear Images (2005), HACIPPR, Veszprém, 2005, ISBN 3-85403-192-0: pp 57-63; [6] M. Kass, A. Witkin, D. Terzopoulos, Snakes: Active contour models, International Journal of Computer Vision, 1988 (1987), Volume 4, pp 321-331; http://mrl.nyu.edu/~dt/papers/ijcv88/ijcv88.pdf [7] Andrew Blake and Michael Isard, Active Contours, Springer, 1998; http://www.robots.ox.ac.uk/~contours/

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download MODEL-BASED HUMAN EAR IDENTIFICATION