Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Representing and Matching multi-object Images with Holes using Concavity Trees by Bilal H. Fadlallah Submitted to the Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical and Computer Engineering at the AMERICAN UNIVERSITY OF BEIRUT June 2008 c Bilal H. Fadlallah 2008. All rights reserved. Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical and Computer Engineering June 5, 2008 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Mohamad Adnan Al-Alaoui Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Karim Kabalan Department Chair, Thesis Committee Member Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Ali Chehab Thesis Committee Member 2 Representing and Matching multi-object Images with Holes using Concavity Trees by Bilal H. Fadlallah Submitted to the Department of Electrical and Computer Engineering on June 5, 2008, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical and Computer Engineering Abstract Concavity trees are data structures used to represent 2-D images based on their metaconcavities and convex hulls. They help compressing the size of bi-level images and are efficient in the process of image matching, a task required in many areas of pattern recognition and analysis. The purpose of this thesis is to expand the scope of concavity trees to cover the compression, retrieval and matching of bi-level multi-shape images containing holes, stressing both the performance and complexity of such an approach. Two different alternatives of applying the concept of edit distance on these structures to determine a scaled level of similarity between any two given images are implemented using Matlab. The concavity tree structure is upgraded as to handle the representation of several embedded objects each of which potentially containing holes. This is done by appending to the root node all info included in the roots of the sub-shapes and attaching to the original tree representations of existing meta-objects. The matching process is extended to handle the new data structure while accounting for the different normalization factors of each subshape. Two approaches using Dijkstras algorithm are implemented for this purpose. The running time of the extraction-matching algorithms is O(n log n) where n is the number of contour points. The matching algorithm itself is polynomial in the number of tree nodes. This is further reduced by transforming lengthy for loops into MEX files to interface with C++. A Graphical User Interface was implemented to visualize the process of the algorithms. A testing platform was also designed to check the matching results over subsets of the MPEG7 dataset. Results success was of 86.8% for these datasets. Thesis Supervisor: Prof. Mohamad Adnan Al-Alaoui Title: Professor, Department of Electrical and Computer Engineering 3 Acknowledgments I would like to thank my family for their encouragement and support during the course of preparation of this thesis. I would also like to thank Prof. Adnan Al-Alaoui for the continuous help, supervising and precious advices he provided in this year without which it would have been difficult to fulfill this work. 4 Contents 1 Introduction 11 2 Literature Review 14 2.1 Concavity Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Tree Edit Distance Matching . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Further Elaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Research Goals 24 3.1 Representing and Matching Multi-Object Images with Holes . . . . . 25 3.2 Enhancing the TED Algorithm Performance and Efficiency . . . . . . 25 3.3 Implementing a GUI and a Testing Framework . . . . . . . . . . . . . 26 3.4 Assessing Results and Comparing with other Methods . . . . . . . . . 27 4 Analysis and Design 28 4.1 Incorporating Multi-Object Images with Holes in Concavity Trees . . 28 4.2 Relative Tree Edit Distance Matching Algorithm . . . . . . . . . . . . 29 4.3 Random Lines Intersection Matching Algorithm . . . . . . . . . . . . 31 4.4 MEX Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4.1 What are MEX Files? . . . . . . . . . . . . . . . . . . . . . . 33 4.4.2 Why MEX Files? . . . . . . . . . . . . . . . . . . . . . . . . . 36 Datasets of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 5 4.6 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Implementation 39 46 5.1 Implementation of Sub-Images/Hole Representation . . . . . . . . . . 46 5.2 Implementation of Relative Tree Edit Distance Matching Algorithm . 50 5.3 Implementation of the Random Lines Intersection Algorithm . . . . . 53 5.4 Converting for-Loops to C++ Via MEX Files . . . . . . . . . . . . . 54 5.5 GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.6 Applying Concavity Trees in the Illiteracy Project . . . . . . . . . . . 58 6 Testing and Assessment 69 6.1 Testing the Algorithms for Image Matching . . . . . . . . . . . . . . 69 6.2 Testing the Running Time . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2.1 Complexity of the CTHI algorithm . . . . . . . . . . . . . . . 72 6.2.2 Complexity of the TEDHI algorithm . . . . . . . . . . . . . . 73 6.2.3 Complexity of the TEDHIR algorithm . . . . . . . . . . . . . 74 6.2.4 Complexity of the General Case . . . . . . . . . . . . . . . . . 74 6.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.4 Comparison with other methods . . . . . . . . . . . . . . . . . . . . . 76 7 Further Research 79 6 List of Figures 2-1 An object with its convex hull, concavities and concavity tree . . . . 15 2-2 The same object as in Fig. 2-1 with its corresponding concavities . . 16 2-3 Two trees T1 and T2 numbered according to a pre-order traversal . . . 19 2-4 Tree edit graph transforming T1 into T2 . . . . . . . . . . . . . . . . . 19 4-1 Identical shapes with different hole location. . . . . . . . . . . . . . . 31 4-2 The Dissecting Line Procedure . . . . . . . . . . . . . . . . . . . . . . 33 4-3 The Matlab/C++ interface . . . . . . . . . . . . . . . . . . . . . . . 35 4-4 Troubleshooting Methodologies For MEX Files . . . . . . . . . . . . . 42 4-5 Sample of the MPEG7 dataset . . . . . . . . . . . . . . . . . . . . . . 43 4-6 Used Subset of the MPEG7 dataset . . . . . . . . . . . . . . . . . . . 43 4-7 Interface Top-Level Design . . . . . . . . . . . . . . . . . . . . . . . . 44 4-8 Top-Level Image Matcher Program . . . . . . . . . . . . . . . . . . . 45 5-1 Bitmap file representing the shapes of three buildings in a city . . . . 47 5-2 Tree representation of the bmp image in Fig. 5-1 . . . . . . . . . . . 47 5-3 Recovered image from the tree in Fig. 5-2 . . . . . . . . . . . . . . . 48 5-4 Reconstructed sub-shape with corresponding extracted sub-shape . . 48 5-5 Bitmap Image representing a shape containing holes . . . . . . . . . . 49 5-6 Tree representation of the bmp image in Fig. 5-5 . . . . . . . . . . . 49 5-7 Recovered image from the tree in Fig. 5-6 . . . . . . . . . . . . . . . 49 5-8 Bitmap Image representing two shapes with holes . . . . . . . . . . . 50 5-9 Tree representation of the bmp image in Fig. 5-8 . . . . . . . . . . . 50 5-10 Reconstructed sub-shape with corresponding extracted sub-shape . . 51 7 5-11 Two images to be matched . . . . . . . . . . . . . . . . . . . . . . . . 51 5-12 Corresponding tree representations for the images in Fig. 5-11 . . . . 52 5-13 Two input images. One representing a bottle, the other a city . . . . 52 5-14 Corresponding tree representations for the images in Fig. 5-13 . . . . 53 5-15 Initial prompt of the ImageConcavityTreeMatcher . . . . . . . . . . . 54 5-16 Prompts in the GUI for DB Rebuild . . . . . . . . . . . . . . . . . . 55 5-17 Prompts in the GUI for program choice . . . . . . . . . . . . . . . . . 55 5-18 Prompt to File Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5-19 Image Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5-20 Best Match Return . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5-21 Recursion Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5-22 Back to the User Choice Input . . . . . . . . . . . . . . . . . . . . . . 57 5-23 Welcome screen of the Proximity software . . . . . . . . . . . . . . . 58 5-24 Initial screen of the Proximity software . . . . . . . . . . . . . . . . . 59 5-25 Loaded Images to the Proximity software . . . . . . . . . . . . . . . . 60 5-26 Corresponding concavity trees representation of images in Fig. 5-25. . 61 5-27 Corresponding convex hulls retrieved for images in Fig. 5-25 . . . . . 62 5-28 Matching results for images in Fig. 5-25 . . . . . . . . . . . . . . . . 63 5-29 Different forms for “aleph” in the Arabic alphabet . . . . . . . . . . . 63 5-30 The 28 letters of the Arabic Alphabet . . . . . . . . . . . . . . . . . . 64 5-31 Sample Input Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5-32 Loading Input versus Reference Images . . . . . . . . . . . . . . . . . 65 5-33 Convex hull representation of the input images in Fig. 5-32 . . . . . . 65 5-34 Concavity tree of the original image in Fig. 5-32 . . . . . . . . . . . . 65 5-35 Correspondence between objects and tree nodes . . . . . . . . . . . . 66 5-36 Sample letter with a hole . . . . . . . . . . . . . . . . . . . . . . . . . 66 5-37 Convex hull representation of the input image in Fig. 5-36 . . . . . . 66 5-38 Concavity tree representation of the input image in Fig. 5-36 . . . . . 67 5-39 Matching result of input letter “ye” and the reference one . . . . . . . 67 5-40 Array of distances of input “ye” to each of the alphabet’s letters . . . 68 8 6-1 Percent matching for different sets of Input images . . . . . . . . . . 70 6-2 Example for a given class . . . . . . . . . . . . . . . . . . . . . . . . . 70 6-3 Classes “Bottle” and “Cellular” are linearly separable . . . . . . . . . 71 6-4 Classes “Car1” and “Car2” and linear separability . . . . . . . . . . . 72 9 List of Tables 4.1 Table showing the main concavity tree (CT) structure parameters . . 4.2 The MEX API Functions. Source: Mathworks (2006), “MEX-files 30 Guide”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3 Running time comparison: original code versus MEX code . . . . . . 41 6.1 Success rate for each class of the synthetic dataset . . . . . . . . . . . 77 6.2 Table showing success rates per image class with comments interpreting the performance the algorithms . . . . . . . . . . . . . . . . . . . . . 10 78 Chapter 1 Introduction The purpose of this thesis work is to design and implement an efficient method to compress, retrieve and match bi-level multi-images that may contain holes. To do so, we first propose to adapt the concavity tree concept, a known concept in shape analysis, to span images containing multiple objects with or without holes. We then suggest methods for matching these structures using dynamic programming. Our effort pertains mainly to the area of pattern recognition, a well developed field that has undergone intensive research in the recent years, and is still the subject of much scrutiny today. The development of a variety of multipurpose algorithms in touch with this field has led to many technological advances in various dependent domains such as computer vision, virtual reality systems, space exploration, criminology, cellular biology, security systems (fingerprints recognition, iris scans, gait recognition...). The field of pattern recognition or analysis holds much prospect in the area of digital image processing, specifically that of shape identification. Its development has proved to be crucial to meet the growing demands for an ever advancing world of technology. Today, a simple search over the internet would reveal a large plethora of different techniques and algorithms for specialized image and pattern recognition. Each algorithm and technique is directed towards a specific application of pattern and image recognition, and as a result, the performance of each technique is application specific and provides maximum performance only for a finite set of applications, while pro11 viding less than acceptable performance when used in other applications. When we speak of performance of a specific method or algorithm we usually have the following criteria in mind: 1. 2. 3. 4. 5. 6. Speed Accuracy Robustness Ease of implementation Cost of implementation and maintenance Potential to be improved Ideally we would like to come up with a methodology that meets all of the above criteria. Unfortunately this is not possible and as a result there will always exist a certain trade-off. We can classify pictorial data input for computer processing into five different categories: 1. 2. 3. 4. 5. Simple RGB colored pictures Full gray scale images Bi-level pictures (i.e. black and white pictures) Continuous curves and lines Sets of discrete points spaced far apart However the span of images we will deal with in this project are two dimensional bi-level (i.e. black and white bit-mapped) images. Dealing with bi-level images is similar to dealing with shape representing images. This is because shape can be considered as solely determined by one of two levels. The case of colored images offers more challenge since shape can only be determined based on color contrast. The primary feature on which the recognition process is based on is shape. In order to be able to compare two bi-level images efficiently and rapidly, a certain image compression scheme is necessary because a method conceived to work directly on the matrix of pixels level to match images will be both time consuming and less prone to detect the patterns inside these images. The compression scheme we propose here is Concavity Tree Compression. Next we start by showing the advantages for the use of Concavity Trees to represent bi-level images. Two advantages are actually predominant: 12 1. Compressing the image and reducing the memory requirements for its storage drastically, while retaining exactly enough information for a very close reconstruction of the original image (or at least with an accuracy depending on a user-defined parameter). 2. Relatively fast and accurate comparison of two bi-level images, by comparing their concavity trees. More details about extracting concavity trees from an image will be given in Chapter 2, that will cover the literature review. 13 Chapter 2 Literature Review 2.1 Concavity Trees A concavity tree is a data structure used to describe non-convex two dimensional shapes. It was first introduced by Sklansky (1972) and has been used by several researchers in the past three decades (Batchelor, 1980a,b; Borgefors and di Baja, 1992, 1996; Xu, 1997). We can define a concavity tree as a rooted tree in which the root represents the whole object whose shape is to be analyzed or represented. The next level of the tree contains nodes that represent concavities along the boundary of that object. Each of the nodes on the following levels represents one of the con-cavities of its parent, i.e., its meta-concavities. If an object or a concavity is itself convex, then the node representing it does not have any children. Figure 2-1 (Refer to Badawy and Kamel (2005)) shows an example of a shape (a), its convex hull, concavities, and meta-concavities (b), and its corresponding concavity tree (c). The shape has five concavities as reflected in level one of the tree. The four leaf nodes in level one correspond to the highlighted triangular concavities shown in (d), whereas the nonleaf node corresponds to the (nonconvex) concavity shown in (e). Similarly, the nodes in levels two and three correspond to the meta-concavities highlighted in (f) and (g), respectively. Typically, each node in a concavity tree stores information pertinent to the part of the object the node is describing (a feature vector for example), in addition to tree meta-data (like the level of the node; the height, number of nodes, 14 and number of leaves in the sub-tree rooted at the node). Figure 2-1: An object (a), its convex hull and concavities (b), the corresponding concavity tree (c), and contour sections corresponding to concavities (d-g). Courtesy of Badawy and Kamel (2005). The process of extracting the concavity tree from a 2D image includes the following steps: 1. 2. 3. 4. Compute the convex hull of the image Start from the leftmost bottom end of the figure Follow the contour points starting from this point Detect concavities: • For each concavity apply the algorithm recursively until we arrive to a concave shape. • If concavity has meta-concavities, they will be represented as nodes in the second or nth level of the tree, with respect to their order (meta-concavity of order 2 or n). 15 Figure 2-2: The same object as in Fig. 2-1 with its corresponding concavities, along with their position in the tree. After introducing the way concavity trees are being extracted from images, we now move to explain how images are being matched using their concavity tree representation. 16 2.2 Tree Edit Distance Matching When we talk about matching, we mean finding the best match or matches to an image from a set of given images. This can be directly done by defining a distance between any two images. The least distance image from as set of images to a given one is then the best match we are looking for. Two methods have been suggested to match concavity trees. The first one has been implemented by Badawy and Kamel (2004), and uses mappings from nodes in the first tree to another. The second uses the concept of Tree Edit Distance in the matching process (Fadlallah et al., 2005). We will focus on the latter method because it is more efficient and accurate, especially for sets of images that exhibit translations, rotations and flipping (Fadlallah et al., 2005). The Tree Edit Distance algorithm aims to match two concavity trees based on the shape of the trees and a set of elementary operations used to transform one tree into another. The tree edit distance algorithm can be used to compare any type of trees. The tree transformation is a set of elementary operations associated with different weights, and the tree edit distance is the cost of the least-cost sequence of operations transforming one tree into another. The simple edit distance algorithm is motivated below. Refer to Valiente (2002) for a detailed analysis of the algorithm. Let T = (V, E1 ) and S = (W, E2 ) be two ordered trees. An elementary edit operation on T and S is one of the following operations: • Deletion: (v, λ) or v → λ where v ∈ V . • Substitution: (v, w) or v → w where v ∈ V and w ∈ W . • Insertion: (λ, w) or λ → w where w ∈ / W. These elementary operations on T and S have different costs constrained by the cost function γ such that Valiente (2002): • γ(v, w) ≥ 0. • γ(v, w) = 0 if and only if v = w. • γ(v, w) = γ(w, v). 17 • γ(v, w) ≤ γ(v, z) + γ(z, w). As a result, the cost of a transformation E for T into S is given by: γ(E) = X γ(v, w) (2.1) (v,w)∈E And the edit distance between ordered trees T and S becomes: Dist(T1 , T2 ) = min{γ(E)}, where E is a valid transformation of T into S. (2.2) First, the two concavity trees should be stored in a pre-order traversal. Then we define the tree edit graph of T and S as the graph where vertices take the form {vw} for each pair of nodes v ∈ {Vo } ∪ V and w ∈ {Wo } ∪ W , where vo ∈ / V and wo ∈ /W are two dummy nodes. The following conditions are the base for building the tree edit graph vertices: • if depth[vi+1 ] ≥ depth[wj+1 ], then (vi wj , vi+1 wj ) exists. • if depth[vi+1 ] = depth[wj+1 ], then (vi wj , vi+1 wj+1 ) exists. • if depth[vi+1 ] ≤ depth[wj+1 ], then (vi wj , vi wj+1 ) exists. In the above, 0 ≤ i ≤ n1 and 0 ≤ j ≤ n2 , where n1 and n2 are the number of nodes in T1 and T2 respectively, with the nodes numbered according to a pre-order traversal. To illustrate, let T1 = (V1 , E1 ) and T2 = (V2 , E2 ) be two ordered trees, as shown in Fig. 2-3. The resulting tree edit graph of the trees is shown in Fig. 2-4 and the algorithm’s implementation can be performed through the pseudocode shown in Alg. 1. 18 Figure 2-3: Two trees T1 (a) and T2 (b) numbered according to a pre-order traversal. Figure 2-4: The figure shows the tree edit graph transforming T1 into T2 . Given a Substitution Cost of 0, a Deletion Cost of 1 and an Insertion Cost of 1, the shortest path is highlighted in red and has a cost of 2 whereas one of the possible paths is highlighted in green and has a cost of 6. 2.3 Complexity Analysis The most time-consuming step in Alg. 1 is Dijkstra’s algorithm. Again, let us consider T1 = (V1 , E1 ) and T2 = (V2 , E2 ), define n to be the number of nodes in T1 , m to be the number of nodes in T2 and assume without loss of generality that m ≥ n. We know 19 that Dijkstra operates on the nxm matrix in which the tree edit graph information is stored. Hence the number of vertices in the directed graph will be the product of the two dimensions, i.e. mn. Note that the complexity of Dijkstra’s algorithm using binary heaps is of TDijkstra (V, E) = (E + V ) log V , where E is the number of edges in the graph and V the number of vertices. Since each vertex can at most be connected to three other vertices, the worst-case number of edges is of 3mn, and the running time of the algorithm will thus be: TTED (n, m) = O{(3mn + mn) log mn} = O{4mn log mn} (2.3) But since m > n and given the limitation of the number of nodes we can write: TTED (n, m) = O{(4m2 ) log m2 } = O{8m2 log m} (2.4) Now since the complexity of Dijkstra’s algorithm for graphs having much fewer edges than n2 and using a Fibonacci heap improves the running time to O(m + n log n), the algorithm’s computational complexity will therefore be: TTED (n, m) = O{(E + V log V )} = O(3mn + mn log mn) = O(3m2 + m2 log m2 ) = O(m2 (3 + 2 log m)) Usually, the number of nodes m is very small compared to the number of contour points, which will favor our algorithm in many applications of image matching. Furthermore, our method has many features that can vary the way matching is conducted. 20 Varying the area parameter would result in different option while comparing a set of images. If you are comparing images that belong to the same object, decreasing the area parameter would help see the differences between the images because we would be increasing the weight of the operations in the low levels compared to the root substitution. But, if you are trying to match different objects, it would be better increasing the area parameter to give a higher weight to the general shapes represented by the roots. 2.4 Further Elaboration After building the tree edit graph, the different operations must be assigned variable costs according to the attributes of the nodes involved. Starting with the substitution cost, it is a function of the area of the concavity relative to the area of the image and the level of the nodes in the concavity tree. It also depends on the different parts of the SCX metrics (Solidity, eCcentricity, eXtent) and the number of children of the two nodes (Badawy and Kamel, 2005). This cost was tested on different shapes on which it shows a great strength in calculating the distances between different kinds of concavities. Concerning the insertion and deletion cost, they must be equal in order to preserve the symmetric property of the Tree Edit Distance. This cost is directly proportional to the area of the concavity relative to the area of the root (object) and to the number of nodes present in the tree rooted at this node and inversely proportional to the level of the node in the concavity tree. Furthermore, we have multiplied the different costs by an area parameter raised to the power ratio of the area of the concavity to the area of the convex hull of the object which should give greater emphasis for the area factor in our cost analysis. Let IC and SC denote respectively the insertion and substitution costs. Area(N ) IC(N ) = Area Parameter Area(Object) × f (Number of nodes, Level) 21 (2.5) Area(N1 ) Area(N ) SC(N1 , N2 ) = Area Parameter Area(Object1 ) 2 + Area(Object 2) × f (Level, SCX, Children) (2.6) 2.5 Limitations Although it shows noticeable improvements in many cases, the algorithm developed still shows many limitations. First the choice of the weights is empirical. This limits the accuracy of the method since empirical choices can not suit all cases even if they can cover a wide spectrum of these cases. Another reason for the non accuracy of the weights is their dependency on features already computed in the phase of concavity tree construction. Chapter 3 outlines some possible solutions to these limitations, where we introduce alternatives to build upon the concept we have just seen. 22 Algorithm 1: Matching Concavity Trees Using Tree Edit Distance (TED) Input: Two Trees T1 and T2 Output: Estimated distance d between T1 and T2 Notation: • • • • T1 and T2 are two ordered concavity trees. T 1 = (V 1, E1) and T 2 = (V 2, E2). T1 is the tree with the minimum number of root children. V1 and V2 are two arrays to store the traversed nodes of T1 and T2 (in a pre-order fashion). - Fix T1 and rearrange T2 in the second level to have greater similarity. - V1t = Pre-order Traversal (T1 ). - V2t = Pre-order Traversal (T2 ). for i ∈ {1...size(V1t )} do for k ∈ {1...size(W2t )} do if depth[vi+1 ] ≥ depth[wj+1 ] then edge(vi wj , vi+1 wj ) exists. else if depth[vi+1 ] = depth[wj+1 ] then edge(vi wj , vi+1 wj+1 ) exists. else edge(vi wj , vi wj+1 ) exists. - Transform the edit graph to a directed connected graph. - Give weight for each edge according to operation (del, ins, sub). - Use Dijkstra’s algorithm to find shortest path between V1t (1)V2t (1) and V1t (size of V1t )V2t (size of V2t ). - Set d to the weight of the shortest path. 23 Chapter 3 Research Goals The goals set for this thesis can be summarized by the following: (a) Expanding the concavity tree concept for representation and retrieval of binary images to incorporate multi-objects with holes. (b) Expanding the scope of the Tree Edit Distance algorithm in order to match the upgraded concavity trees and increase its performance. (c) Designing and Implementing algorithms for the representation, retrieval and matching of multi-images with holes using the new data structure. (d) Enhancing the running time of the written algorithms using MEX Files that interface C++ with Matlab. (e) Designing and Implementing a friendly GUI incorporating the above concepts. (f) Creating new datasets for testing purposes. (g) Designing and Implementing a testing framework to validate the results over the images datasets. 24 3.1 Representing and Matching Multi-Object Images with Holes The Concavity Tree representation will be modified to allow for multi-object images with (or without) holes to be represented in a compressed form. The TED algorithm will then be adjusted as to take the holes and multi-object representation into account during the matching process. The modifications will affect the structure of the tree by appending trees of each sub-shape to the original tree structure, and varying accordingly the parameters of the root node. Note that information will be conveyed to each sub-tree to specify whether it is a sub-shape or a hole or a hole in a sub-shape for retrieval purposes. The main steps are shown below: (a) Construct the tree of the main shape, chosen to be the shape with biggest dimension. (b) Detect the remaining sub-shapes and append their tree representation to the original. (c) For each of the sub-shapes, extract trees for the contained holes (if they exist) and append them to the corresponding sub-shape representation. (d) Normalize the parameters of the root node according to the weights of each sub-shape/node. Note that the detailed design of the process will be described in Chapter 4. 3.2 Enhancing the TED Algorithm Performance and Efficiency Several alternatives exist for improving the tree edit distance algorithm, namely: (a) Checking if one of the two trees is actually any or close to any of the sub trees of the other tree. This will improve the matching rate since at the present time, 25 the cost of matching a tree with a bigger one containing the first is more than the cost targeted. (b) Since the running time of the algorithm is a function of the number of nodes, the performance is made faster by finding a limit to the number of nodes while still preserving the maximum possible useful information about the image. This limit can be achieved by one of two ways, namely: either taking into consideration a limited number of levels, or adjusting the concavity-size determining variable within the concavity tree creation algorithm. (c) Optimizing the weights further to improve the accuracy of the results of our matching. (d) Substituting the pre-order traversal of the concavity trees by an in-order traversal. This may improve its performance on some 2-D shapes. Experimental results may suggest a combination of traversals along with selective usage according to the shape image. (e) Since for loops require very long execution time in Matlab, and since the Matlab implementation of the algorithm is highly dependant on a large number of relatively long (and often embedded) for-loops, the algorithm suffers from a large decrease in performance and increase in running time. The algorithm’s running time dependency on the programming platform being used can be largely reduced by rewriting all the time consuming for loops in C/C++, and calling them when necessary from Matlab using MEX files. 3.3 Implementing a GUI and a Testing Framework The database we will use for testing purposes is a subset of the MPEG7 dataset. The latter is a standard dataset used universally for the matching purposes. We take a subset of the MPEG7 as it is a very big database. As already mentioned, a GUI will be designed to incorporate the concepts illustrated. Basically the GUI has the following characteristics: 26 (a) Two user-input images. (b) Extraction and plot of the concavity trees of the two images. (c) Extraction and plot of the recovered images from the compressed trees. (d) Ability to change the compression-decompression parameters. (e) Matching result of the two images using the new Tree Edit Distance Algorithm. Another GUI is also designed to compute the best match among the dataset to a given input image. This works by extracting the tree representation of an input image and then computing the distance to all others in a given dataset. The best match will be the image whose distance is the smallest to the input image. 3.4 Assessing Results and Comparing with other Methods Using the results obtained through the testing framework, a detailed analysis will be performed describing the strong points as well as weaknesses of this approach. A comparison will be conducted with other methods. The criteria will be mainly the success rate over the whole dataset. 27 Chapter 4 Analysis and Design 4.1 Incorporating Multi-Object Images with Holes in Concavity Trees Designing a way to represent trees for an image with multi-objects containing holes should take into consideration the fact two dimensions are to be handled. This is since an image can have several sub-images, each of which can have several holes. The designed algorithm starts as follows: (a) Read the whole image as a binary matrix. (b) Locate the main sub-image corresponding to the major surface in the image. (c) Locate the other sub-images. (d) Locate holes in each sub-image. (e) Extract the CT representation of the main sub-image with its holes. (f) Extract the concavity tree representation of each sub-image with its holes. (g) Append the tree representation of sub-images to the main tree. An overview of the procedure can be found in Alg. 2: Retrieving the initial image proceeds in the same way as for images with no subshapes or holes. This is done by plotting the contour points of all nodes in the tree and filling in the area inside the envelope with pixel values of 1. At this stage it should be noted that the main parameters embedded in each node of the tree are: 28 Algorithm 2: Extract Concavity Tree for Multi-Images with Holes (CTHI) Input: Image file ”Image.bmp”, Resolution parameter κ Output: Data structure T containing concavity trees. Initialize: • I = Read Image (”Image.bmp”). • NumberSubImages = Number of Sub-Images in the Image. • NumberHoles [1: NumberSubImages] = Array representing the Number of Holes for each Sub-Image. - M = Locate Sub-Images (I) H = Locate Holes (M) T = Extract Concavity Tree (I, Parameter) RM = Represent Sub-Images(M) RH = Represent Holes (H, RM) F = null for i ∈ {1...N umberSubImages} do for k ∈ {1...N umberHoles[i]} do - E[i, j] = Extract Info H[i, j] RH[i, j] = RH[i, j]+E[i, j] T[i] = T[i] + RH [i, j] F[i] = Update F[i] - T[Root] = T[Root]+F - T = (T, isDotted T) 4.2 Relative Tree Edit Distance Matching Algorithm This algorithm consists in comparing the data structures obtained in the previous section in order to match any two images with multi-objects with. The trees generated by the new algorithm will be the input during the matching process. For the sake of example, let A and B be respectively the concavity trees with sub-images and holes for Images A and B. If the two trees have the same number of sub-images, we can apply a novel version of the Tree Edit Distance algorithm to 29 Table 4.1: Table showing the main concavity tree (CT) structure parameters Parameter Contour Hull Leaves Height Children Level Depth Hole Colour Attribute RelArea Type Description Vector of doubles Points along the envelope of the concavity Vector of doubles Coordinates corresponding to convex hulls Integer Number of nodes at the last level Integer Max number of nodes from root to leaves Integer Number of attached nodes Integer Level number in the tree Integer Height level Boolean Type of nodes RGB Node color in the tree Vector of doubles SCX vector of three parameters Relative Area Relative area of the given concavity handle comparison of the sub-images one by one while giving weights to each of them. The weights will have to comply with important criteria such that the area of each sub-imageetc Another alternative would be to compare sub-images by the same manner but not ad-hoc, i.e. comparing the best sub-image match in B with that of A instead of the first ones of A and B. This is susceptible of consuming more time but results will be more accurate. A trickier case arises when we match an image with n sub-images with another with m sub-images where n > m. In this case we propose to compare the sub-images left of A with dummy sub-images, where we define a dummy sub-image as a special concavity tree made of a unique node. In other words, we are computing the distance between the sub-image and a standard invariant shape to which we refer by the name dummy sub-image. Therefore, a large or complex-shaped sub-image will exhibit a bigger distance from the dummy sub-image than a small or simple one, and this will influence the final distance for the image with more sub-images. Comparing two sub-images with holes proceeds the same way as comparing two 30 images with sub-images. Let A be a sub-image with n holes and B another with m holes such that n > m. We propose first to match the two sub-images shapes, then the m holes one by one, and finally the n − m holes left to dummy holes. An overview of the designed algorithm can be found in Alg. 3: 4.3 Random Lines Intersection Matching Algorithm The relative Tree Edit Distance Algorithm does not take into consideration the location of the hole or the sub-shape in the image. This problem does not appear when comparing two images with no holes and sub-shapes as the only concern in that case is the similarity between the images and not the location of the shape in the bitmap image. With images that contain holes or sub-shapes, the location of the hole or sub-shape is very meaningful and ignoring this location in the comparison process can lead to inaccurate results. To see how the location of the hole can affect the comparison of two bi-level images with holes, consider the two images shown in Fig. 4-1: Figure 4-1: Identical shapes with different hole location. Note that the CT extraction of the two images result in identical trees hence a zero matching distance. 31 The two images in Fig. 4-1 consist of two objects each with a hole. The objects are two ellipses with exactly the same size and shape, and the holes are two circles with the same area. Hence, the only difference between the two images is the location of the holes. This location implies a difference in the two figures and therefore they are not exactly similar. This however is not reflected in the concavity trees representation of the images. The best solution to this problem is to use dissecting lines to cut the bi-level images, and then use the dissected parts of each image in the comparison to get a clear idea about the location of holes. The dissecting line is chosen in a random direction, and it cuts the image with holes transforming it into two images. We start first by selecting a line, then dissecting the two images to obtain four new images. Next, the right portion of the first image is compared with the right portion of the second image, and the same is done for left portions. This comparison is sufficient to account for the placement of holes in the original image. To see the effectiveness of the dissecting lines, consider the previous figure. By applying the dissecting line algorithm to the two figures, we select a random direction and cut the figures along this direction. The result of the process is shown in Fig. 4-2. The line transforms the two images into four. By comparing the left portion of the first image with the first portion of the second, and doing the same for the right portions, the difference between the two images is revealed and a certain cost is added, therefore the distance between the two images is no more equal to 0, and the two images are not exactly similar. Usually, any number of lines can be used to compare two figures, and it might be possible to figure out the similarity between two figures using only one dissecting line. However, since the algorithm deals with different numbers of holes or sub-shapes, an 32 Figure 4-2: The Dissecting Line Procedure. arbitrary number of random lines is chosen, which means that the process of dissecting the images and comparing them will be done an arbitrary number of times. By choosing the dissecting lines randomly, the lines will be taken in all possible directions, and by selecting a sufficient number of lines the location of all holes will be identified and thus the matching of bi-level images is further improved. The pseudocode of the algorithm is shown in Alg. 4: 4.4 4.4.1 MEX Files What are MEX Files? MEX stands for MATLAB Executable. They are a way to call your custom C/C++ routines directly from MATLAB as if they were MATLAB built-in functions. MEXfiles are dynamically linked subroutines produced from C/C++ source code that, after being compiled, can be run from within MATLAB in the same way as MATLAB Mfiles or built-in functions. The external interface functions provide functionality to transfer data between MEX-files and MATLAB, and the ability to call MATLAB functions from C/C++ code. In MATLAB all variables are stored as a single type of structure called the mxAr33 ray. The mxArray declaration corresponds to the internal data structure that MATLAB uses to represent arrays. The mxArray is the C representation of all MATLAB arrays. If the variable contains complex numbers as elements, the MATLAB array includes structure contains 2, 1-dimensional arrays of double-precision numbers called pr(containing the real data) and pi (containing the imaginary data) Mathworks (2006a,b). In C++ mxArray’s are declared as follows: mxArray *x; The values inside the newly created mxArray is undefined when it is declared, and should be initialized with an mx* routine before it is used. Data inside the array is stored in row major order (i.e. the values are read down and then across the array). To access the data inside an mxArrays, the API functions shown in Table 4.2 are used. The MEX API provides several functions that allow us to determine the various states of an mxArray. These functions are used to check the inputs to the MEX-file, to make sure that they are of correct type and number. The interfacing between MATLAB and C++ in MEX-files is done using the external interface functions that provide functionality for transferring data between MEX files and MATLAB, and also the ability to call MATLAB functions from C++ Mathworks (2006b). The following diagram illustrates the interfacing between MATLAB and C++ using the MEX methodology (Fig. 4-3). Every C/C++ MEX-file must include the header file mex.h, which is necessary in order to use the mx* and mex* routines. The code from which a MEX-file is composed consists of: 34 Figure 4-3: The Matlab/C++ interface. The Computational Routine This routine contains the code for performing the computations that we want to implement as a MEX-file Mathworks (2006b). The Gateway Routine This routine interfaces the computational routine with MATLAB, and calls it as a subroutine. The gateway routine to every MEX-file is called mexFunction. This function is the entry point MATLAB uses to access the DLL. The mexFunction definition is as follows: mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray *prhs[ ]) .... where nlhs is the number of expected mxArrays, plhs is an array of pointers to mxArrays (expected outputs), nrhs is the number of inputs, and prhs is an array 35 of pointers to mxArrays (to input data which is read-only and not altered by the mexFunction) Mathworks (2006b). In the gateway routine, one can access the data in the mxArray structure and then manipulate this data in the C/C++ computational subroutine. After calling the C/C++ computational routine from the gateway, one can set a pointer of type mxArray to the data it returns. This enables MATLAB to recognize the output from your computational routine as the output from the MEX-file. Fig. 4-4 shows a graphical representation of the MEX cycle Mathworks (2006b). It is portrayed in Fig. 4-4 and proceeds along the following steps: (a) The ‘func.c’ gateway routine uses the ‘mxCreate’ functions to create the MATLAB arrays for the output arguments. (b) It sets plhs[0], plhs [1], ... to the pointers to the newly created MATLAB arrays. (c) It uses the ‘mxGet’ functions to extract the input data from prhs[0], prhs [1],... (d) It calls the C/C++ subroutine, and passes the input and output data pointers as function parameters. Troubleshooting the methodologies for MEX Files can be explained through the diagram in Fig. 4-4. 4.4.2 Why MEX Files? The main advantage we are hoping to gain from the use of MEX Files is speed. We can rewrite the bottleneck computations, like for-loops, as a MEX files for increased efficiency Mathworks (2006a,b). The most time consuming function in the algorithm is the Dijkstra function which is composed of numerous lengthy for-loops. Rewriting it as a C++ MEX file would largely reduce its running time, and as a result increase the overall performance of the algorithm. In order to demonstrate the speed advantage of using MEX files to write timeconsuming for loops in C++, instead of Matlab, we wrote a function that contains 36 several embedded for loops as both a mat file and as a mex file. We then compared their relative performance. function k=just_for_test(sentinel) total=0; i=0; j=0; k=0; for i=0:sentinel, for j=0:sentinel, for k=0:sentinel, total = total +1/6; total = total +1/6; total = total +1/6; total = total +1/6; total = total +1/6; total = total +1/6; if total >4000 total= total-98; end end end end k=total; The corresponding mex file would look like: #include "mex.h" void mexFunction(int nlhs, mxArray *plhs[], int nrhs, ... const mxArray *prhs[]) { double *total; 37 int i=0,j=0,k=0,sentinel; plhs[0] = mxCreateDoubleMatrix(1,1,mxREAL); total = (double *)mxGetPr(plhs[0]); sentinel =(int)mxGetScalar(prhs[0]); *total=0.0; for (i=0;i<=sentinel;i++) for (j=0;j<=sentinel;j++) for (k=0;k<=sentinel;k++) { *total = *total +1.0/6.0; *total = *total +1.0/6.0; *total = *total +1.0/6.0; *total = *total +1.0/6.0; *total = *total +1.0/6.0; *total = *total +1.0/6.0; if (*total > 4000) *total= *total-98.0; } } Upon running both functions for a various number of input values we obtain the results shown in Table 4.3, which show an average improvement of 45.336%. This clearly shows the advantage of converting time consuming for-loops to C++ instead of running them in their native Matlab code. 4.5 Datasets of Images A subset of the MPEG7 dataset is taken as reference database to test the above algorithms. A sample of the MPEG7 dataset is shown below. Note at this stage that 38 the designed algorithms can handle .gif as well as .bmp image files and are invariant whether the background of the image consists of pixels of ones or zeros. The used subset in this text is shown below. It consists of 36 classes of images. The Testing Framework that will be designed in the next section takes each of the dataset’s images and finds the closest match to it among the images of the dataset. The optimum would be to classify the images correctly, i.e. to be able to separate the images of each class. 4.6 Interface Design In order to facilitate the testing section of the above algorithms, a GUI interface was designed. The presence of a GUI renders the algorithms more accessible and userfriendly. The diagram in Fig. 4-7 shows how the program works from a top-level perspective. The GUI contains the possibility to change the parameters of the fCTHI algorithms permitting to specify the image compressing ratio or the image recovery ration. It also includes the pixel sensitivity for detecting concavities. A Help section is also present in the GUI to illustrate the main concepts detailed in the previous sections. A pseudocode description of the GUI is shown in Alg. 4-8. 39 Algorithm 3: Matching Concavity Trees with Holes using Relative Comparison Algorithm (CTHI) Input: Two images I1 and I2 containing shapes with Holes Output: Distance d denoting computed distance between the two trees. Initialize: • Set T1 = CTHI {I1 } and T2 = CTHI {I2 }. • Let DummyHole refer to a dummy node, or a structure representing a spherical shape with a fixed area. • Let DummyImage refer to a dummy sub-image or a special concavity tree made of a unique node. • Let Weight refer to a weight function that returns the relative area of a hole’s concavity tree. function Distance = TEDHI {T1 , T2 } Distance = 0 SubImagesMax = max{N umberSubImages(T1 ), N umberSubImages(T2 )} for i ∈ {1...N umberSubImagesM ax} do if i < N umberSubImagesM ax then d = d + W eight(Sub − Image[i]) × TEDH{T1 [i], T2 [i]} else d = d + W eight(Sub − Image[i]) × TEDH{T1 [i], DummyImage} Normalize Distance function Distance = TEDH {T1 , T2 } Distance = 0 NumberHolesMax = max{N umberHoles(T1 ), N umberHoles(T2 )} for i ∈ {1...N umberHolesM ax} do if i < N umberHolesM ax then d = d + W eight(Hole[i])× TED {T1 [i], T2 [i]} else d = d + W eight(Hole[i])× TED {T1 [i], DummyHole} Normalize Distance 40 Algorithm 4: Matching Concavity Trees with Holes using Random Intersecting Lines (TEDHIR) Input: Two images I1 and I2 containing shapes with Holes and a user parameter κ defining the number of lines Output: Distance d denoting computed distance between the two trees. function Distance = TEDHIR {T1 , T2 } Distance = 0 for i ∈ {1...N umberU serP arameter} do Generate equations for User Parameter number of random lines. Dissect I1 and I2 with the generated lines. T1 (i) = CTHI(I1 ) T2 (i) = CTHI(I2 ) Norm = NormalizeDistance{TEDH {T1 [i], T2 [i]}} d = d + N orm Table 4.2: The MEX API Functions. Source: Mathworks (2006), “MEX-files Guide”. Operation API Function Array creation mxCreateNumericArray, mxCreateCellArray, mxCreateCharArray Array access mxGetPr, mxGetPi, mxGetData, mxGetCell Array modification mxSetPr, mxSetPi, mxSetData, mxSetField Memory management mxMalloc, mxCalloc, mxFree, mexMakeMemoryPersistent, mexAtExit, mxDestroy Array, memcpy Table 4.3: Running time comparison: original code versus MEX code Sentinel Matlab (s) MEX (s) Improvement(%) 900 1000 900 1000 11.30 15.01 44.38 45.635 41 1200 1500 1200 25.73 45.637 1500 49.68 45.693 Figure 4-4: Troubleshooting Methodologies For MEX Files. 42 Figure 4-5: Sample of the MPEG7 dataset. Figure 4-6: Used Subset of the MPEG7 dataset. 43 Figure 4-7: Interface Top-Level Design. 44 Figure 4-8: Top-Level Image Matcher Program. 45 Chapter 5 Implementation 5.1 Implementation of Sub-Images/Hole Representation The Concavity Tree Structure and Tree Edit Distance algorithm were updated as to include representation and matching of bi-level images containing sub-images with holes. The changes made to these algorithms as well as pseudocodes were already depicted in Sections 4.3 and 4.4. Next we present some examples showing how subimages as well as holes are being handled. Example 1: Representation for multi-shape Images Consider the image in Fig. 51. The image contains three sub-images or sub-objects, namely each of the three buildings. The original tree will be that of the one occupying the most space which is the double tower building. As no sub-image contains holes, the tree can also be split into three sub-trees. The tree is shown in Fig. 5-2, where each of the sub-trees is colored differently. Note that the main tree is the one to the left representing the major sub-image. The 46 Figure 5-1: Bitmap file representing the shapes of three buildings in a city. Figure 5-2: Tree representation of the bmp image in Fig. 5-1. two other sub-images are represented with the sub-trees in red and green. Next the recovered image from this tree with standard input parameter vector is seen in Fig. 5-3: Mappings between convex hulls and sub-trees can be seen in Fig. 5-4. Example 2: Representation for multi-holes Images Consider the image shown in Fig. 5-5. As can be seen, this image contains two holes. The original tree will represent the overall shape with no holes whereas the remaining two sub-trees represent respectively each of the holes. In Fig. 5-6, the sub-tree in 47 Figure 5-3: Recovered image from the tree in Fig. 5-2. Figure 5-4: Reconstructed sub-shape with corresponding extracted sub-shape. red represents the bottom hole having two meta-concavities and the sub-tree in green represents the upper hole having four meta-concavities. The recovered image from this tree with standard input parameter vector is seen in Fig. 5-7. 48 Figure 5-5: Bitmap Image representing a shape containing holes. Figure 5-6: Tree representation of the bmp image in Fig. 5-5. Figure 5-7: Recovered image from the tree in Fig. 5-6. Example 3: Representation for multi-holes Images Consider the image shown in Fig. 5-8. Mappings between convex hulls and sub-trees can be seen in Fig. 5-10. 49 Figure 5-8: Bitmap Image representing a shape containing two sub-shapes with their holes. Figure 5-9: Tree representation of the bmp image in Fig. 5-8. 5.2 Implementation of Relative Tree Edit Distance Matching Algorithm Consider the images in Fig. 5-11. Note that the two images are quite identical and only differ by some discrepancy in the lower part of the main object. As a result, we expect to have a low distance when matching the two images, say around 0.1 on a [0 1] scale. 50 Figure 5-10: Reconstructed sub-shape with corresponding extracted sub-shape. Figure 5-11: Two images to be matched. The matching algorithm extracts the tree representation for both images (Fig. 5-12). As can be seen, only one difference exists between the two trees structures. This corresponds to the cost of one operation in the tree edit graph, which is an Insert operation to move from T1 to T2 or a Delete operation to move from T2 to T1 . In this case, and since all other nodes match, the cost of transforming Image1 into Image2 51 Figure 5-12: Corresponding tree representations for the images in Fig. 5-11. would be the cost of inserting or deleting the concerned node (highlighted in Fig. 5-12). Applying the TEDHI algorithm gives a distance of 0.0847 (around 8%) which is anticipated according to the above explanation. To illustrate a case where we have substantial differences between the input images, consider the images shown in Fig. 5-13. Figure 5-13: Two input images. One representing a bottle, the other a city! Obviously the two images are very dissimilar in structure as well as shapes. The first has a unique shape with no holes whereas the second has three sub-shapes, one of which containing holes. As a result, we expect to have a high distance when matching the two images, say around 0.7 or 0.8 on a [01] scale. The matching algorithm extracts the tree representation for both images. 52 Figure 5-14: Corresponding tree representations for the images in Fig. 5-13. It is highly noticeable that many differences exist between the two trees structures. Performing the edit distance algorithm corresponds to the cost of several Insert, Delete, and Substitution costs. In this case, the costs of each operation will add up and the the cost of transforming the first image into the second would be the sums of all costs. Applying the TEDHI algorithm gives a distance of 0.813 which is also anticipated according to the above explanation. 5.3 Implementation of the Random Lines Intersection Algorithm In order to implement the Dissecting Line idea, a function is written to cut the image matrix by a number of horizontal, vertical, or positively sloped random lines. The number of random lines is specified by an input parameter. For each slice we compare the right portion of the first image to that of the second image, and the left portion of the first image to that of the second image (in case of a horizontal cut we compare the upper slices of each image together, and do the same for the lower slices). To administer the comparison we use the standard TEDHI algorithm to compare the corresponding portions of each image, after finding the concavity tree representation for each of the 4 resulting ”pseudo-images”. The costs of each trial of the random lines are normalized and added to obtain a cost of transforming the first image into the second image. 53 5.4 Converting for-Loops to C++ Via MEX Files All time consuming for-loops in the Matlab code were transformed into C++ MEX files. Since the function that contains the largest number of time consuming for-loops is the Dijkstra function, it was the first function to be converted to MEX files. The loops containing complex data structures as in functions CTHI and TEDHI weren’t converted as this proved to be costly from a running time perspective. All other for-loops that don’t involve these structures were transformed to MEX. The effort of rewriting code as MEX files was worth since performance increased drastically upon testing the running time of the new algorithms, especially for shapes involving many concavities. More details on performance improvement due to the use of MEX files will be presented in the next chapter. 5.5 GUI Implementation As designed in Chapter 4, the Image Concavity Tree Matcher is simply started by typing ImageConcavityTreeMatcher in the Matlab command window. A prompt then asks whether or not to rebuild the Database of the testing dataset. Selection is by default set to No (Fig. 5-15). Figure 5-15: Initial prompt of the ImageConcavityTreeMatcher. Upon selecting Yes, the database starts rebuilding. Construction usually takes around 8 minutes for the selected dataset containing 114 images, an average of 4 seconds for extracting, processing and storing the results of an image. 54 Figure 5-16: Prompts in the GUI for DB Rebuild. After the DB rebuild, the GUI prompts the user for a program choice: Figure 5-17: Prompts in the GUI for program choice. ImageMatcher matches an input image with the structures stored in the database. The proximity program extracts CT plots, recovers concavities and matches two user-input images. At this stage, note that the input can be obtained through several other ways according whether we interface the program to input mouse or pad drawing. If we select input bell-3.gif for example, the GUI proceeds as in Fig. 5-19: The best match is then posted on the screen as in Fig. 5-20: 55 Figure 5-18: Prompt to File Input. Figure 5-19: Image Matching. Figure 5-20: Best Match Return. The GUI then asks whether we would like to input another letter (Fig. 5-21): 56 Figure 5-21: Recursion Start. Selecting Yes will start a recursion in the ImageMatcher. Upon clicking No, the initial prompt is back (Fig. 5-22): Figure 5-22: Back to the User Choice Input. Choosing Proximity opens the Proximity program. The splash and the initial screens are shown in Fig. 5-22: The corresponding concavity trees are shown in Fig. 5-26: This GUI was used in order to test the algorithms over the subset of the MPEG7 dataset. Testing results will be exposed in the next chapter. 57 Figure 5-23: Welcome screen of the Proximity software. 5.6 Applying Concavity Trees in the Illiteracy Project The purpose of the Illiteracy project is to design and implement an efficient method for helping analphabets in the Arab world overcome Illiteracy problems. One way to do so would be to recognize and correct the handwriting of analphabet students. This can be implemented by applying the concavity trees concept. The Arabic alphabet consists of 28 basic letters, some of which admit customized forms as the aleph for example which admits the forms shown in fig. 5-29: In what follows, and for neatness purposes, the standard 28 letters representation for the Arabic alphabet will be used, knowing that the method described can be also applied for the remaining characters. As a result, the database used can be seen in Fig. 5-30: 58 Figure 5-24: Initial screen of the Proximity software. The algorithms were tested against a variety of input datasets. A sample input dataset can be seen in Fig. 5-31: Example 1: Loading figure ”input ye” from the above input dataset into the GUI gives the image preview shown in Fig. 5-32. The image is contrasted against the reference in Fig. 5-33: The output of the CTHI function is shown in Fig. 5-34: 59 Figure 5-25: Loaded Images to the Proximity software. The blue nodes correspond to the concavities and meta-concavities of the contour of the body of the input, i.e. the ø shape, while the red node represents the unique concavity for the - shape under the ø shape. Remaining nodes correspond to the concavities and meta-concavities of the contour of the holes inside the image. This is illustrated in Fig. 5-35, where we show the correspondence between the holes and their representation in the tree. Example 2: An example for a letter with holes is the ”sad”, shown in Fig. 5-36: The convex hull reconstruction for the letter is shown in Fig. 5-37 and its tree representation in Fig. 5-38 : 60 Figure 5-26: Corresponding concavity trees representation of the input images shown in Fig. 5-25. As can be noted, the single hole is represented by the red node in the tree. Running the Relative Tree Edit Distance Matching Algorithm on the input letter ye gives a distance of 0.0617 as can be specified in the below screenshot. For the algorithm to function properly, this value should be the least among the 28 distance values to each of the alphabet letters. Fig. 5-40 displays the array corresponding to distances of input “ye” to each of the alphabet’s letters. It can be seen that min(Dist) = Dist[28] which corresponds to the value of the last letter “ye”. 61 Figure 5-27: Corresponding convex hulls retrieved for the input images shown in Fig. 5-25. 62 Figure 5-28: Matching results for images in Fig. 5-25. The difference between the two consists mainly in the upper sub-object amounting for around 30% difference in figures, which is close to the obtained distance (0.28876). Figure 5-29: Different forms for “aleph” in the Arabic alphabet. 63 Figure 5-30: Arabic Alphabet. It consists of 28 letters. It is the main dataset to be used in the Illiteracy project. Figure 5-31: Sample Input Dataset. 64 Figure 5-32: Loading Input versus Reference Images. Figure 5-33: Convex hull representation of the input images in Fig. 5-32. Figure 5-34: Concavity tree of the original image in Fig. 5-32. In red, we can see the representation of the dash, considered as a sub-image within the image. 65 Figure 5-35: Correspondence between objects and tree nodes. Figure 5-36: Sample letter with a hole. Figure 5-37: Convex hull representation of the input image in Fig. 5-36. 66 Figure 5-38: Concavity tree representation of the input image in Fig. 5-36. Figure 5-39: Matching result of input letter “ye” and the reference one. 67 Figure 5-40: Array of distances of input “ye” to each of the alphabet’s letters. 68 Chapter 6 Testing and Assessment 6.1 Testing the Algorithms for Image Matching In order to test the algorithms, we ran 36 classes, each of eight images and recorded how many of the images are matched correctly to the reference database. This was repeated for two more times after making minor modifications in the images to make sure the results are stable. The results ranged between 72% and 98% with an average of 86.8% for the Relative Tree Edit Distance Algorithm and 86.05% for the Random Lines Intersection Algorithm. Results can be seen in Table 6.1: A graph displaying the results of the TEDHI algorithm can be seen in Fig. 6-1: Hence, we can say that the matching algorithms achieved a success rate of around 86% over the dataset. The below histogram shows that, in general, classes of images have different distance ranges from a given class of images. For example, although images for classes “Bottle” and “Cellular” are somehow similar, the range 69 Figure 6-1: Percent matching for different sets of Input images. Figure 6-2: Example for a given class. of the eight closest images to class “Brick” from class “Bottle” is different from the range of the eight closest images to class ”brick” from class “Cellular”. Defining A to be the range of the “Bottle” class and B that of the “Cellular” class, we have Range(A) = [0.064 0.123] and Range(B) = [0.214 0.312]. Since A ∩ B = Φ (empty set), there is no overlapping in the ranges and consequently neat separation between both images classes. The optimum case would be to 70 have non overlapping ranges for all combinations of image classes which would result in a 100% matching rate for the given class. Obviously this is difficult because of the high number of classes. An illustration of this case can be seen in the histogram shown in Fig. 6-3. Figure 6-3: Classes “Bottle” and “Cellular” are linearly separable since their distance ranges do not intersect. On the other side, some image classes can interfere with others. This is the case of classes “Car1” and “Car2”. An illustration to this case is shown in Fig. 6-4 where the reference class towards with which distances are computed is class “Face”. 6.2 Testing the Running Time Upon incorporating MEX Files into the algorithms, the running time was reduced by a factor of 38%. This is considerably important since, and as already mentioned it 71 Figure 6-4: Classes car1 and car2 are not linearly separable since their distance ranges do intersect. reduced the database rebuilding time to 8 minutes from around 12 minutes. 6.2.1 Complexity of the CTHI algorithm Let n = Number of Contour Pixels of all Sub-Shapes. Let h = Height of the Concavity Tree. Let γ = Number of Sub-Images. Let d = Number of Holes in all Sub-Shapes. Let α = Number of Concavity Tree Nodes. Let β = User Input Parameter for the Random Lines Intersection Algorithm. Since the Running Time of the Concavity Tree Algorithm is O(nh) , we can write the running time of the CTHI function as: T (n) = O(γdnh) = δ 2 O(nh) where δ = O(1) 72 (6.1) In the worst case, we have: h= n where k > 4 for the smallest concavity to be considered as node k (6.2) In this case, the running time becomes: δ2 n T (n) = O(γdnh) = δ 2 O(n ) = O(n2 ) k k For other cases, h varies according to k.2h ≈ n =⇒ h ≈ log2 (6.3) n k . For which the running time becomes: n T1 (n) = O(γdnh) = δ 2 O(n log2 ) = δ 2 O(n log2 n) k (6.4) For an image with one shape and holes (assuming h ≈ log2 nk ), the running time for CTHI becomes: n T1 (n) = O(dnh) = δ.O(n log2 ) = δ.O(n log2 n) k (6.5) This is the same expected running time for an image with multi-shapes but no holes. 6.2.2 Complexity of the TEDHI algorithm Here the Tree Edit Distance process is executed for dn runs, that is why the running time for TEDHI is expressed as: 73 T2 (n) = γ.d.TDijkstra = γ.dO(E + V log V ) = γ.d.O(α2 {3 + 2 log α}) n 2 = O log2 α {3 + 2 log α} k n = O γ log2 α2 {3 + 2 log α} k 2 = O 3γα log α log2 n = O 3γα3 log2 n 6.2.3 Complexity of the TEDHIR algorithm The Tree Edit Distance process is executed for β runs when the TEDHIR algorithm executes, hence: T3 (n) = O(βα2 {3 + 2 log α}) = O(3βα2 log α) 6.2.4 (6.6) Complexity of the General Case As a result, we can express the running time of the procedure that represents and matches two images as follows: T (n) = T1 (n) + max{T2 (n), T3 (n)} = δO(n log2 n) + O(3γα3 log2 n) = O(n log n) Therefore the overall complexity is polynomial in time. However, the running time of the matching algorithms is much less than this since they depend on the number of nodes in a tree which is considerably less than that of the contour pixels. 74 6.3 Results and Analysis Results can be seen in Table 6.1: To assess the matching performance over the datasets and any images, we can say: • Concavity trees yield excellent results over most of the dataset’s classes. Concavity trees have several competitive advantages over other methods, particularly in that they are: – Scale-invariant: The distance of an image to any scaled version of it is zero. – Rotation-invariant: The distance of an image to any rotated version of it is zero. – Shape-oriented: The estimated distance between two images similar in shapes but dissimilar in size is most likely to be small. • Average results were obtained over some classes. This leads to raising the drawbacks of using CTs: – Since CTs are shape oriented and invariant under scaling, two dissimilar entities represented by similar shapes with tiny differences are difficult to catch since tiny differences would lead to tiny matching distances. – Although holes and sub-shapes are represented differently in the CT data structure, they both contribute to matching which can affect the matching distance negatively. To illustrate this, an image with one sub-shape and two holes may have a close distance to an image with two sub-shapes and no holes. It is important to mention as well that retrieving images was shown to be reliable and stable for any image, independently of its size, shape and structure. 75 6.4 Comparison with other methods Comparing the upgraded concavity tree method for extraction and matching with other methods reveals its strength and advantages. Comparing with the conformal mapping method proposed by Badawy and Kamel (2004), it can be seen that the current algorithm’s performance exceeds the one described in the paper by around 5%. Further elaborating on this point, it is interesting to say that the best matching rates achieved over binary images with no holes or sub-shapes reach around 91 or 92% over the MPEG7 dataset. The proposed algorithms achieve a rate of 87% for more complex types of images. 76 Table 6.1: Success rate for each class of the synthetic dataset Class TEDHI Success Rate (%) TEDHIR Success Rate (%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 (36) 89 87 90 91 88 87 ≈ µ 86 90 86 85 84 87 89 92 87 83 74 91 85 98 80 84 85 79 92 90 83 98 82 84 76 89 90 87 86 (92) 82 81 94 82 81 86 86 ≈ µ 92 88 81 87 75 72 96 85 89 90 91 80 83 85 84 89 91 88 90 82 95 80 89 78 93 91 90 82 (90) 77 Table 6.2: Table showing success rates per image class with comments interpreting the performance the algorithms Class Success Rate (%) Description apple1 apple2 bell bottle brick camel car1 car2 carriage cellular phone cellular cup children chopper classic device3 device4 device5 device8 face fork fountain hammer jar key octopus pencil personal car rat sea snake shoe spoon spring stef teddy watch 80 81 90 91 88 87 ≈ µ 86 90 86 85 84 87 89 92 87 83 78 91 93 91 80 84 85 79 92 90 83 98 82 89 76 89 90 87 86 95 Similarity with apple2 Similarity with apple1 Similarity with bell Resemblance with hammer as CTs have same structure Similarity with face as CT is rotation invariant Similarity with hammer Confusion with key as CT is size invariant 78 Chapter 7 Further Research In this thesis, the Concavity Tree concept has been upgraded to cover images containing multi-shape objects with holes. Algorithms for extracting and plotting the new concavity trees in an efficient and performant fashion have been implemented. Further, two alternatives of image matching have been introduced and proved to achieve high success rates over subsets of the MPEG7. It was shown how to apply the concept on real applications such as the Illiteracy project. A GUI was designed to illustrate the concepts introduced and a testing framework was implemented to test the success rates of the images included in the designed datasets. The research area targetted in this thesis is one of the research fields that are still prone to much efforts and investigation. The main prospective improvement that can be thought of is the extension of this concept to colored images. Colored images can be transformed to binary images but this obviously involves a loss of information proportional to the type of images. Actually, and as this seems feasible for grayscale or RGB images, it exhibits many problems when trying to apply it for other colored images types. 79 Bibliography Badawy, O. E. and Kamel, M. (2004). Matching concavity trees. Proceedings of the Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition, SSPR 2004 and SPR 2004, Lisbon, Portugal, 3138:556–564. Badawy, O. E. and Kamel, M. (2005). Compressing 2-d shapes using concavity trees. pages 559–566. Batchelor, B. (1980a). Hierarchical shape description based upon convex hulls of concavities. Cybernetics and Systems, 10(1-3):205–210. Batchelor, B. (1980b). Shape descriptors for labeling concavity trees. Cybernetics and Systems, 10(1-3):233–237. Borgefors, G. and di Baja, G. S. (1992). Methods for hierarchical analysis of concavities. Pattern Recognition, Proceedings of the 11th IAPR International Conference on Image, Speech and Signal Analysis, III(1):171–175. Borgefors, G. and di Baja, G. S. (1996). Analyzing nonconvex 2d and 3d patterns. Computer Vision and Image Understanding, 63(1):145–157. Fadlallah, B., Hayek, H., Badawy, O. E., and Kamel, M. (2005). Matching concavity trees using tree edit distance. Technical Report, Pattern Analysis and Machine Intelligence Laboratory, University of Waterloo. Mathworks (2006a). The components of a c mex-file. Mathworks (2006b). Mex-files guide. Sklansky, J. (1972). Measuring concavity on a rectangular mosaic. IEEE Trans. Comput., 21(12):1355–1364. Valiente, G. (2002). Algorithms on Trees and Graphs. Springer. Xu, J. (1997). Hierarchical representation of 2-d shapes using convex polygons: A morphological approach. Pattern Recogn. Lett., 18(10):1009–1017. 80