Download Representing and Matching multi-object Images with Holes using Concavity Trees

Document related concepts

Lattice model (finance) wikipedia , lookup

Binary search tree wikipedia , lookup

Quadtree wikipedia , lookup

Transcript
Representing and Matching multi-object Images
with Holes using Concavity Trees
by
Bilal H. Fadlallah
Submitted to the Department of Electrical and Computer Engineering
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical and Computer Engineering
at the
AMERICAN UNIVERSITY OF BEIRUT
June 2008
c Bilal H. Fadlallah 2008. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical and Computer Engineering
June 5, 2008
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Mohamad Adnan Al-Alaoui
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Karim Kabalan
Department Chair, Thesis Committee Member
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Ali Chehab
Thesis Committee Member
2
Representing and Matching multi-object Images with Holes
using Concavity Trees
by
Bilal H. Fadlallah
Submitted to the Department of Electrical and Computer Engineering
on June 5, 2008, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical and Computer Engineering
Abstract
Concavity trees are data structures used to represent 2-D images based on their metaconcavities and convex hulls. They help compressing the size of bi-level images and
are efficient in the process of image matching, a task required in many areas of pattern
recognition and analysis.
The purpose of this thesis is to expand the scope of concavity trees to cover the
compression, retrieval and matching of bi-level multi-shape images containing holes,
stressing both the performance and complexity of such an approach. Two different
alternatives of applying the concept of edit distance on these structures to determine a
scaled level of similarity between any two given images are implemented using Matlab.
The concavity tree structure is upgraded as to handle the representation of several
embedded objects each of which potentially containing holes. This is done by appending to the root node all info included in the roots of the sub-shapes and attaching
to the original tree representations of existing meta-objects. The matching process is
extended to handle the new data structure while accounting for the different normalization factors of each subshape. Two approaches using Dijkstras algorithm are implemented for this purpose. The running time of the extraction-matching algorithms
is O(n log n) where n is the number of contour points. The matching algorithm itself
is polynomial in the number of tree nodes. This is further reduced by transforming
lengthy for loops into MEX files to interface with C++.
A Graphical User Interface was implemented to visualize the process of the algorithms. A testing platform was also designed to check the matching results over
subsets of the MPEG7 dataset. Results success was of 86.8% for these datasets.
Thesis Supervisor: Prof. Mohamad Adnan Al-Alaoui
Title: Professor, Department of Electrical and Computer Engineering
3
Acknowledgments
I would like to thank my family for their encouragement and support during the
course of preparation of this thesis. I would also like to thank Prof. Adnan Al-Alaoui
for the continuous help, supervising and precious advices he provided in this year
without which it would have been difficult to fulfill this work.
4
Contents
1 Introduction
11
2 Literature Review
14
2.1
Concavity Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2
Tree Edit Distance Matching . . . . . . . . . . . . . . . . . . . . . . .
17
2.3
Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.4
Further Elaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.5
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3 Research Goals
24
3.1
Representing and Matching Multi-Object Images with Holes . . . . .
25
3.2
Enhancing the TED Algorithm Performance and Efficiency . . . . . .
25
3.3
Implementing a GUI and a Testing Framework . . . . . . . . . . . . .
26
3.4
Assessing Results and Comparing with other Methods . . . . . . . . .
27
4 Analysis and Design
28
4.1
Incorporating Multi-Object Images with Holes in Concavity Trees . .
28
4.2
Relative Tree Edit Distance Matching Algorithm . . . . . . . . . . . .
29
4.3
Random Lines Intersection Matching Algorithm . . . . . . . . . . . .
31
4.4
MEX Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.4.1
What are MEX Files? . . . . . . . . . . . . . . . . . . . . . .
33
4.4.2
Why MEX Files? . . . . . . . . . . . . . . . . . . . . . . . . .
36
Datasets of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.5
5
4.6
Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Implementation
39
46
5.1
Implementation of Sub-Images/Hole Representation . . . . . . . . . .
46
5.2
Implementation of Relative Tree Edit Distance Matching Algorithm .
50
5.3
Implementation of the Random Lines Intersection Algorithm . . . . .
53
5.4
Converting for-Loops to C++ Via MEX Files . . . . . . . . . . . . .
54
5.5
GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.6
Applying Concavity Trees in the Illiteracy Project . . . . . . . . . . .
58
6 Testing and Assessment
69
6.1
Testing the Algorithms for Image Matching
. . . . . . . . . . . . . .
69
6.2
Testing the Running Time . . . . . . . . . . . . . . . . . . . . . . . .
71
6.2.1
Complexity of the CTHI algorithm . . . . . . . . . . . . . . .
72
6.2.2
Complexity of the TEDHI algorithm . . . . . . . . . . . . . .
73
6.2.3
Complexity of the TEDHIR algorithm . . . . . . . . . . . . .
74
6.2.4
Complexity of the General Case . . . . . . . . . . . . . . . . .
74
6.3
Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
6.4
Comparison with other methods . . . . . . . . . . . . . . . . . . . . .
76
7 Further Research
79
6
List of Figures
2-1 An object with its convex hull, concavities and concavity tree . . . .
15
2-2 The same object as in Fig. 2-1 with its corresponding concavities . .
16
2-3 Two trees T1 and T2 numbered according to a pre-order traversal . . .
19
2-4 Tree edit graph transforming T1 into T2 . . . . . . . . . . . . . . . . .
19
4-1 Identical shapes with different hole location. . . . . . . . . . . . . . .
31
4-2 The Dissecting Line Procedure . . . . . . . . . . . . . . . . . . . . . .
33
4-3 The Matlab/C++ interface . . . . . . . . . . . . . . . . . . . . . . .
35
4-4 Troubleshooting Methodologies For MEX Files . . . . . . . . . . . . .
42
4-5 Sample of the MPEG7 dataset . . . . . . . . . . . . . . . . . . . . . .
43
4-6 Used Subset of the MPEG7 dataset . . . . . . . . . . . . . . . . . . .
43
4-7 Interface Top-Level Design . . . . . . . . . . . . . . . . . . . . . . . .
44
4-8 Top-Level Image Matcher Program . . . . . . . . . . . . . . . . . . .
45
5-1 Bitmap file representing the shapes of three buildings in a city . . . .
47
5-2 Tree representation of the bmp image in Fig. 5-1 . . . . . . . . . . .
47
5-3 Recovered image from the tree in Fig. 5-2 . . . . . . . . . . . . . . .
48
5-4 Reconstructed sub-shape with corresponding extracted sub-shape . .
48
5-5 Bitmap Image representing a shape containing holes . . . . . . . . . .
49
5-6 Tree representation of the bmp image in Fig. 5-5 . . . . . . . . . . .
49
5-7 Recovered image from the tree in Fig. 5-6 . . . . . . . . . . . . . . .
49
5-8 Bitmap Image representing two shapes with holes . . . . . . . . . . .
50
5-9 Tree representation of the bmp image in Fig. 5-8 . . . . . . . . . . .
50
5-10 Reconstructed sub-shape with corresponding extracted sub-shape . .
51
7
5-11 Two images to be matched . . . . . . . . . . . . . . . . . . . . . . . .
51
5-12 Corresponding tree representations for the images in Fig. 5-11 . . . .
52
5-13 Two input images. One representing a bottle, the other a city . . . .
52
5-14 Corresponding tree representations for the images in Fig. 5-13 . . . .
53
5-15 Initial prompt of the ImageConcavityTreeMatcher . . . . . . . . . . .
54
5-16 Prompts in the GUI for DB Rebuild
. . . . . . . . . . . . . . . . . .
55
5-17 Prompts in the GUI for program choice . . . . . . . . . . . . . . . . .
55
5-18 Prompt to File Input . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5-19 Image Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5-20 Best Match Return . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5-21 Recursion Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5-22 Back to the User Choice Input . . . . . . . . . . . . . . . . . . . . . .
57
5-23 Welcome screen of the Proximity software . . . . . . . . . . . . . . .
58
5-24 Initial screen of the Proximity software . . . . . . . . . . . . . . . . .
59
5-25 Loaded Images to the Proximity software . . . . . . . . . . . . . . . .
60
5-26 Corresponding concavity trees representation of images in Fig. 5-25. .
61
5-27 Corresponding convex hulls retrieved for images in Fig. 5-25 . . . . .
62
5-28 Matching results for images in Fig. 5-25 . . . . . . . . . . . . . . . .
63
5-29 Different forms for “aleph” in the Arabic alphabet . . . . . . . . . . .
63
5-30 The 28 letters of the Arabic Alphabet . . . . . . . . . . . . . . . . . .
64
5-31 Sample Input Dataset . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5-32 Loading Input versus Reference Images . . . . . . . . . . . . . . . . .
65
5-33 Convex hull representation of the input images in Fig. 5-32 . . . . . .
65
5-34 Concavity tree of the original image in Fig. 5-32 . . . . . . . . . . . .
65
5-35 Correspondence between objects and tree nodes . . . . . . . . . . . .
66
5-36 Sample letter with a hole . . . . . . . . . . . . . . . . . . . . . . . . .
66
5-37 Convex hull representation of the input image in Fig. 5-36 . . . . . .
66
5-38 Concavity tree representation of the input image in Fig. 5-36 . . . . .
67
5-39 Matching result of input letter “ye” and the reference one . . . . . . .
67
5-40 Array of distances of input “ye” to each of the alphabet’s letters . . .
68
8
6-1 Percent matching for different sets of Input images . . . . . . . . . .
70
6-2 Example for a given class . . . . . . . . . . . . . . . . . . . . . . . . .
70
6-3 Classes “Bottle” and “Cellular” are linearly separable . . . . . . . . .
71
6-4 Classes “Car1” and “Car2” and linear separability . . . . . . . . . . .
72
9
List of Tables
4.1
Table showing the main concavity tree (CT) structure parameters . .
4.2
The MEX API Functions. Source: Mathworks (2006), “MEX-files
30
Guide”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.3
Running time comparison: original code versus MEX code . . . . . .
41
6.1
Success rate for each class of the synthetic dataset . . . . . . . . . . .
77
6.2
Table showing success rates per image class with comments interpreting
the performance the algorithms . . . . . . . . . . . . . . . . . . . . .
10
78
Chapter 1
Introduction
The purpose of this thesis work is to design and implement an efficient method to
compress, retrieve and match bi-level multi-images that may contain holes. To do
so, we first propose to adapt the concavity tree concept, a known concept in shape
analysis, to span images containing multiple objects with or without holes. We then
suggest methods for matching these structures using dynamic programming. Our
effort pertains mainly to the area of pattern recognition, a well developed field that
has undergone intensive research in the recent years, and is still the subject of much
scrutiny today. The development of a variety of multipurpose algorithms in touch
with this field has led to many technological advances in various dependent domains
such as computer vision, virtual reality systems, space exploration, criminology, cellular biology, security systems (fingerprints recognition, iris scans, gait recognition...).
The field of pattern recognition or analysis holds much prospect in the area of digital
image processing, specifically that of shape identification. Its development has proved
to be crucial to meet the growing demands for an ever advancing world of technology.
Today, a simple search over the internet would reveal a large plethora of different
techniques and algorithms for specialized image and pattern recognition. Each algorithm and technique is directed towards a specific application of pattern and image
recognition, and as a result, the performance of each technique is application specific
and provides maximum performance only for a finite set of applications, while pro11
viding less than acceptable performance when used in other applications. When we
speak of performance of a specific method or algorithm we usually have the following
criteria in mind:
1.
2.
3.
4.
5.
6.
Speed
Accuracy
Robustness
Ease of implementation
Cost of implementation and maintenance
Potential to be improved
Ideally we would like to come up with a methodology that meets all of the above
criteria. Unfortunately this is not possible and as a result there will always exist a
certain trade-off. We can classify pictorial data input for computer processing into
five different categories:
1.
2.
3.
4.
5.
Simple RGB colored pictures
Full gray scale images
Bi-level pictures (i.e. black and white pictures)
Continuous curves and lines
Sets of discrete points spaced far apart
However the span of images we will deal with in this project are two dimensional
bi-level (i.e. black and white bit-mapped) images. Dealing with bi-level images is
similar to dealing with shape representing images. This is because shape can be
considered as solely determined by one of two levels. The case of colored images
offers more challenge since shape can only be determined based on color contrast.
The primary feature on which the recognition process is based on is shape. In order
to be able to compare two bi-level images efficiently and rapidly, a certain image
compression scheme is necessary because a method conceived to work directly on the
matrix of pixels level to match images will be both time consuming and less prone
to detect the patterns inside these images. The compression scheme we propose here
is Concavity Tree Compression. Next we start by showing the advantages for the
use of Concavity Trees to represent bi-level images. Two advantages are actually
predominant:
12
1. Compressing the image and reducing the memory requirements for its storage drastically, while retaining exactly enough information for a very close reconstruction of the
original image (or at least with an accuracy depending on a user-defined parameter).
2. Relatively fast and accurate comparison of two bi-level images, by comparing their
concavity trees.
More details about extracting concavity trees from an image will be given in Chapter
2, that will cover the literature review.
13
Chapter 2
Literature Review
2.1
Concavity Trees
A concavity tree is a data structure used to describe non-convex two dimensional
shapes. It was first introduced by Sklansky (1972) and has been used by several
researchers in the past three decades (Batchelor, 1980a,b; Borgefors and di Baja,
1992, 1996; Xu, 1997). We can define a concavity tree as a rooted tree in which the
root represents the whole object whose shape is to be analyzed or represented. The
next level of the tree contains nodes that represent concavities along the boundary of
that object. Each of the nodes on the following levels represents one of the con-cavities
of its parent, i.e., its meta-concavities. If an object or a concavity is itself convex, then
the node representing it does not have any children. Figure 2-1 (Refer to Badawy
and Kamel (2005)) shows an example of a shape (a), its convex hull, concavities,
and meta-concavities (b), and its corresponding concavity tree (c). The shape has
five concavities as reflected in level one of the tree. The four leaf nodes in level one
correspond to the highlighted triangular concavities shown in (d), whereas the nonleaf node corresponds to the (nonconvex) concavity shown in (e). Similarly, the nodes
in levels two and three correspond to the meta-concavities highlighted in (f) and (g),
respectively. Typically, each node in a concavity tree stores information pertinent
to the part of the object the node is describing (a feature vector for example), in
addition to tree meta-data (like the level of the node; the height, number of nodes,
14
and number of leaves in the sub-tree rooted at the node).
Figure 2-1: An object (a), its convex hull and concavities (b), the corresponding
concavity tree (c), and contour sections corresponding to concavities (d-g). Courtesy
of Badawy and Kamel (2005).
The process of extracting the concavity tree from a 2D image includes the following
steps:
1.
2.
3.
4.
Compute the convex hull of the image
Start from the leftmost bottom end of the figure
Follow the contour points starting from this point
Detect concavities:
• For each concavity apply the algorithm recursively until we arrive to a concave
shape.
• If concavity has meta-concavities, they will be represented as nodes in the second
or nth level of the tree, with respect to their order (meta-concavity of order 2
or n).
15
Figure 2-2: The same object as in Fig. 2-1 with its corresponding concavities, along
with their position in the tree.
After introducing the way concavity trees are being extracted from images, we
now move to explain how images are being matched using their concavity tree representation.
16
2.2
Tree Edit Distance Matching
When we talk about matching, we mean finding the best match or matches to an
image from a set of given images. This can be directly done by defining a distance
between any two images. The least distance image from as set of images to a given
one is then the best match we are looking for. Two methods have been suggested to
match concavity trees. The first one has been implemented by Badawy and Kamel
(2004), and uses mappings from nodes in the first tree to another. The second uses
the concept of Tree Edit Distance in the matching process (Fadlallah et al., 2005). We
will focus on the latter method because it is more efficient and accurate, especially for
sets of images that exhibit translations, rotations and flipping (Fadlallah et al., 2005).
The Tree Edit Distance algorithm aims to match two concavity trees based on the
shape of the trees and a set of elementary operations used to transform one tree into
another. The tree edit distance algorithm can be used to compare any type of trees.
The tree transformation is a set of elementary operations associated with different
weights, and the tree edit distance is the cost of the least-cost sequence of operations
transforming one tree into another. The simple edit distance algorithm is motivated
below. Refer to Valiente (2002) for a detailed analysis of the algorithm.
Let T = (V, E1 ) and S = (W, E2 ) be two ordered trees. An elementary edit operation
on T and S is one of the following operations:
• Deletion: (v, λ) or v → λ where v ∈ V .
• Substitution: (v, w) or v → w where v ∈ V and w ∈ W .
• Insertion: (λ, w) or λ → w where w ∈
/ W.
These elementary operations on T and S have different costs constrained by the cost
function γ such that Valiente (2002):
• γ(v, w) ≥ 0.
• γ(v, w) = 0 if and only if v = w.
• γ(v, w) = γ(w, v).
17
• γ(v, w) ≤ γ(v, z) + γ(z, w).
As a result, the cost of a transformation E for T into S is given by:
γ(E) =
X
γ(v, w)
(2.1)
(v,w)∈E
And the edit distance between ordered trees T and S becomes:
Dist(T1 , T2 ) = min{γ(E)}, where E is a valid transformation of T into S.
(2.2)
First, the two concavity trees should be stored in a pre-order traversal. Then we
define the tree edit graph of T and S as the graph where vertices take the form {vw}
for each pair of nodes v ∈ {Vo } ∪ V and w ∈ {Wo } ∪ W , where vo ∈
/ V and wo ∈
/W
are two dummy nodes. The following conditions are the base for building the tree
edit graph vertices:
• if depth[vi+1 ] ≥ depth[wj+1 ], then (vi wj , vi+1 wj ) exists.
• if depth[vi+1 ] = depth[wj+1 ], then (vi wj , vi+1 wj+1 ) exists.
• if depth[vi+1 ] ≤ depth[wj+1 ], then (vi wj , vi wj+1 ) exists.
In the above, 0 ≤ i ≤ n1 and 0 ≤ j ≤ n2 , where n1 and n2 are the number of nodes in
T1 and T2 respectively, with the nodes numbered according to a pre-order traversal.
To illustrate, let T1 = (V1 , E1 ) and T2 = (V2 , E2 ) be two ordered trees, as shown in
Fig. 2-3. The resulting tree edit graph of the trees is shown in Fig. 2-4 and the
algorithm’s implementation can be performed through the pseudocode shown in Alg.
1.
18
Figure 2-3: Two trees T1 (a) and T2 (b) numbered according to a pre-order traversal.
Figure 2-4: The figure shows the tree edit graph transforming T1 into T2 . Given a
Substitution Cost of 0, a Deletion Cost of 1 and an Insertion Cost of 1, the shortest
path is highlighted in red and has a cost of 2 whereas one of the possible paths is
highlighted in green and has a cost of 6.
2.3
Complexity Analysis
The most time-consuming step in Alg. 1 is Dijkstra’s algorithm. Again, let us consider
T1 = (V1 , E1 ) and T2 = (V2 , E2 ), define n to be the number of nodes in T1 , m to be the
number of nodes in T2 and assume without loss of generality that m ≥ n. We know
19
that Dijkstra operates on the nxm matrix in which the tree edit graph information
is stored. Hence the number of vertices in the directed graph will be the product of
the two dimensions, i.e. mn. Note that the complexity of Dijkstra’s algorithm using
binary heaps is of TDijkstra (V, E) = (E + V ) log V , where E is the number of edges in
the graph and V the number of vertices. Since each vertex can at most be connected
to three other vertices, the worst-case number of edges is of 3mn, and the running
time of the algorithm will thus be:
TTED (n, m) = O{(3mn + mn) log mn} = O{4mn log mn}
(2.3)
But since m > n and given the limitation of the number of nodes we can write:
TTED (n, m) = O{(4m2 ) log m2 } = O{8m2 log m}
(2.4)
Now since the complexity of Dijkstra’s algorithm for graphs having much fewer edges
than n2 and using a Fibonacci heap improves the running time to O(m + n log n), the
algorithm’s computational complexity will therefore be:
TTED (n, m) = O{(E + V log V )}
= O(3mn + mn log mn)
= O(3m2 + m2 log m2 )
= O(m2 (3 + 2 log m))
Usually, the number of nodes m is very small compared to the number of contour
points, which will favor our algorithm in many applications of image matching. Furthermore, our method has many features that can vary the way matching is conducted.
20
Varying the area parameter would result in different option while comparing a set of
images. If you are comparing images that belong to the same object, decreasing the
area parameter would help see the differences between the images because we would
be increasing the weight of the operations in the low levels compared to the root
substitution. But, if you are trying to match different objects, it would be better increasing the area parameter to give a higher weight to the general shapes represented
by the roots.
2.4
Further Elaboration
After building the tree edit graph, the different operations must be assigned variable
costs according to the attributes of the nodes involved. Starting with the substitution
cost, it is a function of the area of the concavity relative to the area of the image and
the level of the nodes in the concavity tree. It also depends on the different parts of
the SCX metrics (Solidity, eCcentricity, eXtent) and the number of children of the
two nodes (Badawy and Kamel, 2005). This cost was tested on different shapes on
which it shows a great strength in calculating the distances between different kinds of
concavities. Concerning the insertion and deletion cost, they must be equal in order
to preserve the symmetric property of the Tree Edit Distance. This cost is directly
proportional to the area of the concavity relative to the area of the root (object)
and to the number of nodes present in the tree rooted at this node and inversely
proportional to the level of the node in the concavity tree. Furthermore, we have
multiplied the different costs by an area parameter raised to the power ratio of the
area of the concavity to the area of the convex hull of the object which should give
greater emphasis for the area factor in our cost analysis. Let IC and SC denote
respectively the insertion and substitution costs.
Area(N )
IC(N ) = Area Parameter Area(Object) × f (Number of nodes, Level)
21
(2.5)
Area(N1 )
Area(N )
SC(N1 , N2 ) = Area Parameter Area(Object1 )
2
+ Area(Object
2)
× f (Level, SCX, Children)
(2.6)
2.5
Limitations
Although it shows noticeable improvements in many cases, the algorithm developed
still shows many limitations. First the choice of the weights is empirical. This limits
the accuracy of the method since empirical choices can not suit all cases even if they
can cover a wide spectrum of these cases. Another reason for the non accuracy of the
weights is their dependency on features already computed in the phase of concavity
tree construction. Chapter 3 outlines some possible solutions to these limitations,
where we introduce alternatives to build upon the concept we have just seen.
22
Algorithm 1: Matching Concavity Trees Using Tree Edit Distance (TED)
Input: Two Trees T1 and T2
Output: Estimated distance d between T1 and T2
Notation:
•
•
•
•
T1 and T2 are two ordered concavity trees.
T 1 = (V 1, E1) and T 2 = (V 2, E2).
T1 is the tree with the minimum number of root children.
V1 and V2 are two arrays to store the traversed nodes of T1 and T2 (in a
pre-order fashion).
- Fix T1 and rearrange T2 in the second level to have greater similarity.
- V1t = Pre-order Traversal (T1 ).
- V2t = Pre-order Traversal (T2 ).
for i ∈ {1...size(V1t )} do
for k ∈ {1...size(W2t )} do
if depth[vi+1 ] ≥ depth[wj+1 ] then
edge(vi wj , vi+1 wj ) exists.
else if depth[vi+1 ] = depth[wj+1 ] then
edge(vi wj , vi+1 wj+1 ) exists.
else
edge(vi wj , vi wj+1 ) exists.
- Transform the edit graph to a directed connected graph.
- Give weight for each edge according to operation (del, ins, sub).
- Use Dijkstra’s algorithm to find shortest path between V1t (1)V2t (1) and
V1t (size of V1t )V2t (size of V2t ).
- Set d to the weight of the shortest path.
23
Chapter 3
Research Goals
The goals set for this thesis can be summarized by the following:
(a) Expanding the concavity tree concept for representation and retrieval of binary
images to incorporate multi-objects with holes.
(b) Expanding the scope of the Tree Edit Distance algorithm in order to match the
upgraded concavity trees and increase its performance.
(c) Designing and Implementing algorithms for the representation, retrieval and
matching of multi-images with holes using the new data structure.
(d) Enhancing the running time of the written algorithms using MEX Files that
interface C++ with Matlab.
(e) Designing and Implementing a friendly GUI incorporating the above concepts.
(f) Creating new datasets for testing purposes.
(g) Designing and Implementing a testing framework to validate the results over
the images datasets.
24
3.1
Representing and Matching Multi-Object Images with Holes
The Concavity Tree representation will be modified to allow for multi-object images
with (or without) holes to be represented in a compressed form. The TED algorithm
will then be adjusted as to take the holes and multi-object representation into account
during the matching process. The modifications will affect the structure of the tree
by appending trees of each sub-shape to the original tree structure, and varying
accordingly the parameters of the root node. Note that information will be conveyed
to each sub-tree to specify whether it is a sub-shape or a hole or a hole in a sub-shape
for retrieval purposes. The main steps are shown below:
(a) Construct the tree of the main shape, chosen to be the shape with biggest
dimension.
(b) Detect the remaining sub-shapes and append their tree representation to the
original.
(c) For each of the sub-shapes, extract trees for the contained holes (if they exist)
and append them to the corresponding sub-shape representation.
(d) Normalize the parameters of the root node according to the weights of each
sub-shape/node.
Note that the detailed design of the process will be described in Chapter 4.
3.2
Enhancing the TED Algorithm Performance
and Efficiency
Several alternatives exist for improving the tree edit distance algorithm, namely:
(a) Checking if one of the two trees is actually any or close to any of the sub trees
of the other tree. This will improve the matching rate since at the present time,
25
the cost of matching a tree with a bigger one containing the first is more than
the cost targeted.
(b) Since the running time of the algorithm is a function of the number of nodes,
the performance is made faster by finding a limit to the number of nodes while
still preserving the maximum possible useful information about the image. This
limit can be achieved by one of two ways, namely: either taking into consideration a limited number of levels, or adjusting the concavity-size determining
variable within the concavity tree creation algorithm.
(c) Optimizing the weights further to improve the accuracy of the results of our
matching.
(d) Substituting the pre-order traversal of the concavity trees by an in-order traversal. This may improve its performance on some 2-D shapes. Experimental
results may suggest a combination of traversals along with selective usage according to the shape image.
(e) Since for loops require very long execution time in Matlab, and since the Matlab
implementation of the algorithm is highly dependant on a large number of relatively long (and often embedded) for-loops, the algorithm suffers from a large
decrease in performance and increase in running time. The algorithm’s running time dependency on the programming platform being used can be largely
reduced by rewriting all the time consuming for loops in C/C++, and calling
them when necessary from Matlab using MEX files.
3.3
Implementing a GUI and a Testing Framework
The database we will use for testing purposes is a subset of the MPEG7 dataset. The
latter is a standard dataset used universally for the matching purposes. We take a
subset of the MPEG7 as it is a very big database. As already mentioned, a GUI
will be designed to incorporate the concepts illustrated. Basically the GUI has the
following characteristics:
26
(a) Two user-input images.
(b) Extraction and plot of the concavity trees of the two images.
(c) Extraction and plot of the recovered images from the compressed trees.
(d) Ability to change the compression-decompression parameters.
(e) Matching result of the two images using the new Tree Edit Distance Algorithm.
Another GUI is also designed to compute the best match among the dataset to a given
input image. This works by extracting the tree representation of an input image and
then computing the distance to all others in a given dataset. The best match will be
the image whose distance is the smallest to the input image.
3.4
Assessing Results and Comparing with other
Methods
Using the results obtained through the testing framework, a detailed analysis will
be performed describing the strong points as well as weaknesses of this approach. A
comparison will be conducted with other methods. The criteria will be mainly the
success rate over the whole dataset.
27
Chapter 4
Analysis and Design
4.1
Incorporating Multi-Object Images with Holes
in Concavity Trees
Designing a way to represent trees for an image with multi-objects containing holes
should take into consideration the fact two dimensions are to be handled. This is
since an image can have several sub-images, each of which can have several holes.
The designed algorithm starts as follows:
(a) Read the whole image as a binary matrix.
(b) Locate the main sub-image corresponding to the major surface in the image.
(c) Locate the other sub-images.
(d) Locate holes in each sub-image.
(e) Extract the CT representation of the main sub-image with its holes.
(f) Extract the concavity tree representation of each sub-image with its holes.
(g) Append the tree representation of sub-images to the main tree.
An overview of the procedure can be found in Alg. 2:
Retrieving the initial image proceeds in the same way as for images with no subshapes or holes. This is done by plotting the contour points of all nodes in the tree
and filling in the area inside the envelope with pixel values of 1. At this stage it
should be noted that the main parameters embedded in each node of the tree are:
28
Algorithm 2: Extract Concavity Tree for Multi-Images with Holes (CTHI)
Input: Image file ”Image.bmp”, Resolution parameter κ
Output: Data structure T containing concavity trees.
Initialize:
• I = Read Image (”Image.bmp”).
• NumberSubImages = Number of Sub-Images in the Image.
• NumberHoles [1: NumberSubImages] = Array representing the Number of
Holes for each Sub-Image.
-
M = Locate Sub-Images (I)
H = Locate Holes (M)
T = Extract Concavity Tree (I, Parameter)
RM = Represent Sub-Images(M)
RH = Represent Holes (H, RM)
F = null
for i ∈ {1...N umberSubImages} do
for k ∈ {1...N umberHoles[i]} do
-
E[i, j] = Extract Info H[i, j]
RH[i, j] = RH[i, j]+E[i, j]
T[i] = T[i] + RH [i, j]
F[i] = Update F[i]
- T[Root] = T[Root]+F
- T = (T, isDotted T)
4.2
Relative Tree Edit Distance Matching Algorithm
This algorithm consists in comparing the data structures obtained in the previous
section in order to match any two images with multi-objects with. The trees generated by the new algorithm will be the input during the matching process.
For the sake of example, let A and B be respectively the concavity trees with
sub-images and holes for Images A and B. If the two trees have the same number
of sub-images, we can apply a novel version of the Tree Edit Distance algorithm to
29
Table 4.1: Table showing the main concavity tree (CT) structure parameters
Parameter
Contour
Hull
Leaves
Height
Children
Level
Depth
Hole
Colour
Attribute
RelArea
Type
Description
Vector of doubles Points along the envelope of the concavity
Vector of doubles Coordinates corresponding to convex hulls
Integer
Number of nodes at the last level
Integer
Max number of nodes from root to leaves
Integer
Number of attached nodes
Integer
Level number in the tree
Integer
Height level
Boolean
Type of nodes
RGB
Node color in the tree
Vector of doubles
SCX vector of three parameters
Relative Area
Relative area of the given concavity
handle comparison of the sub-images one by one while giving weights to each of them.
The weights will have to comply with important criteria such that the area of each
sub-imageetc Another alternative would be to compare sub-images by the same manner but not ad-hoc, i.e. comparing the best sub-image match in B with that of A
instead of the first ones of A and B. This is susceptible of consuming more time but
results will be more accurate.
A trickier case arises when we match an image with n sub-images with another
with m sub-images where n > m. In this case we propose to compare the sub-images
left of A with dummy sub-images, where we define a dummy sub-image as a special
concavity tree made of a unique node. In other words, we are computing the distance
between the sub-image and a standard invariant shape to which we refer by the name
dummy sub-image. Therefore, a large or complex-shaped sub-image will exhibit a
bigger distance from the dummy sub-image than a small or simple one, and this will
influence the final distance for the image with more sub-images.
Comparing two sub-images with holes proceeds the same way as comparing two
30
images with sub-images. Let A be a sub-image with n holes and B another with m
holes such that n > m. We propose first to match the two sub-images shapes, then
the m holes one by one, and finally the n − m holes left to dummy holes.
An overview of the designed algorithm can be found in Alg. 3:
4.3
Random Lines Intersection Matching Algorithm
The relative Tree Edit Distance Algorithm does not take into consideration the location of the hole or the sub-shape in the image. This problem does not appear when
comparing two images with no holes and sub-shapes as the only concern in that case
is the similarity between the images and not the location of the shape in the bitmap
image. With images that contain holes or sub-shapes, the location of the hole or
sub-shape is very meaningful and ignoring this location in the comparison process
can lead to inaccurate results.
To see how the location of the hole can affect the comparison of two bi-level images
with holes, consider the two images shown in Fig. 4-1:
Figure 4-1: Identical shapes with different hole location. Note that the CT extraction
of the two images result in identical trees hence a zero matching distance.
31
The two images in Fig. 4-1 consist of two objects each with a hole. The objects
are two ellipses with exactly the same size and shape, and the holes are two circles
with the same area. Hence, the only difference between the two images is the location
of the holes. This location implies a difference in the two figures and therefore they
are not exactly similar. This however is not reflected in the concavity trees representation of the images.
The best solution to this problem is to use dissecting lines to cut the bi-level images, and then use the dissected parts of each image in the comparison to get a clear
idea about the location of holes. The dissecting line is chosen in a random direction,
and it cuts the image with holes transforming it into two images. We start first by
selecting a line, then dissecting the two images to obtain four new images. Next,
the right portion of the first image is compared with the right portion of the second
image, and the same is done for left portions. This comparison is sufficient to account
for the placement of holes in the original image.
To see the effectiveness of the dissecting lines, consider the previous figure. By
applying the dissecting line algorithm to the two figures, we select a random direction
and cut the figures along this direction. The result of the process is shown in Fig.
4-2. The line transforms the two images into four. By comparing the left portion
of the first image with the first portion of the second, and doing the same for the
right portions, the difference between the two images is revealed and a certain cost is
added, therefore the distance between the two images is no more equal to 0, and the
two images are not exactly similar.
Usually, any number of lines can be used to compare two figures, and it might be
possible to figure out the similarity between two figures using only one dissecting line.
However, since the algorithm deals with different numbers of holes or sub-shapes, an
32
Figure 4-2: The Dissecting Line Procedure.
arbitrary number of random lines is chosen, which means that the process of dissecting the images and comparing them will be done an arbitrary number of times. By
choosing the dissecting lines randomly, the lines will be taken in all possible directions,
and by selecting a sufficient number of lines the location of all holes will be identified
and thus the matching of bi-level images is further improved. The pseudocode of the
algorithm is shown in Alg. 4:
4.4
4.4.1
MEX Files
What are MEX Files?
MEX stands for MATLAB Executable. They are a way to call your custom C/C++
routines directly from MATLAB as if they were MATLAB built-in functions. MEXfiles are dynamically linked subroutines produced from C/C++ source code that, after
being compiled, can be run from within MATLAB in the same way as MATLAB Mfiles or built-in functions. The external interface functions provide functionality to
transfer data between MEX-files and MATLAB, and the ability to call MATLAB
functions from C/C++ code.
In MATLAB all variables are stored as a single type of structure called the mxAr33
ray. The mxArray declaration corresponds to the internal data structure that MATLAB uses to represent arrays. The mxArray is the C representation of all MATLAB
arrays. If the variable contains complex numbers as elements, the MATLAB array includes structure contains 2, 1-dimensional arrays of double-precision numbers
called pr(containing the real data) and pi (containing the imaginary data) Mathworks
(2006a,b).
In C++ mxArray’s are declared as follows:
mxArray *x;
The values inside the newly created mxArray is undefined when it is declared, and
should be initialized with an mx* routine before it is used. Data inside the array is
stored in row major order (i.e. the values are read down and then across the array).
To access the data inside an mxArrays, the API functions shown in Table 4.2 are used.
The MEX API provides several functions that allow us to determine the various states
of an mxArray. These functions are used to check the inputs to the MEX-file, to make
sure that they are of correct type and number.
The interfacing between MATLAB and C++ in MEX-files is done using the external interface functions that provide functionality for transferring data between MEX
files and MATLAB, and also the ability to call MATLAB functions from C++ Mathworks (2006b). The following diagram illustrates the interfacing between MATLAB
and C++ using the MEX methodology (Fig. 4-3).
Every C/C++ MEX-file must include the header file mex.h, which is necessary in
order to use the mx* and mex* routines. The code from which a MEX-file is composed
consists of:
34
Figure 4-3: The Matlab/C++ interface.
The Computational Routine
This routine contains the code for performing the computations that we want to implement as a MEX-file Mathworks (2006b).
The Gateway Routine
This routine interfaces the computational routine with MATLAB, and calls it as a
subroutine. The gateway routine to every MEX-file is called mexFunction. This
function is the entry point MATLAB uses to access the DLL. The mexFunction
definition is as follows:
mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray *prhs[ ]) ....
where nlhs is the number of expected mxArrays, plhs is an array of pointers to
mxArrays (expected outputs), nrhs is the number of inputs, and prhs is an array
35
of pointers to mxArrays (to input data which is read-only and not altered by the
mexFunction) Mathworks (2006b).
In the gateway routine, one can access the data in the mxArray structure and then
manipulate this data in the C/C++ computational subroutine. After calling the
C/C++ computational routine from the gateway, one can set a pointer of type mxArray to the data it returns. This enables MATLAB to recognize the output from your
computational routine as the output from the MEX-file. Fig. 4-4 shows a graphical
representation of the MEX cycle Mathworks (2006b). It is portrayed in Fig. 4-4 and
proceeds along the following steps:
(a) The ‘func.c’ gateway routine uses the ‘mxCreate’ functions to create the MATLAB arrays for the output arguments.
(b) It sets plhs[0], plhs [1], ... to the pointers to the newly created MATLAB arrays.
(c) It uses the ‘mxGet’ functions to extract the input data from prhs[0], prhs [1],...
(d) It calls the C/C++ subroutine, and passes the input and output data pointers
as function parameters.
Troubleshooting the methodologies for MEX Files can be explained through the diagram in Fig. 4-4.
4.4.2
Why MEX Files?
The main advantage we are hoping to gain from the use of MEX Files is speed. We
can rewrite the bottleneck computations, like for-loops, as a MEX files for increased
efficiency Mathworks (2006a,b). The most time consuming function in the algorithm
is the Dijkstra function which is composed of numerous lengthy for-loops. Rewriting
it as a C++ MEX file would largely reduce its running time, and as a result increase
the overall performance of the algorithm.
In order to demonstrate the speed advantage of using MEX files to write timeconsuming for loops in C++, instead of Matlab, we wrote a function that contains
36
several embedded for loops as both a mat file and as a mex file. We then compared
their relative performance.
function k=just_for_test(sentinel)
total=0;
i=0;
j=0;
k=0;
for i=0:sentinel,
for j=0:sentinel,
for k=0:sentinel,
total = total +1/6; total = total +1/6;
total = total +1/6; total = total +1/6;
total = total +1/6; total = total +1/6;
if total >4000
total= total-98;
end
end
end
end
k=total;
The corresponding mex file would look like:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, ...
const mxArray *prhs[])
{
double *total;
37
int i=0,j=0,k=0,sentinel;
plhs[0] = mxCreateDoubleMatrix(1,1,mxREAL);
total = (double *)mxGetPr(plhs[0]);
sentinel =(int)mxGetScalar(prhs[0]);
*total=0.0;
for (i=0;i<=sentinel;i++)
for (j=0;j<=sentinel;j++)
for (k=0;k<=sentinel;k++)
{
*total = *total +1.0/6.0; *total = *total +1.0/6.0;
*total = *total +1.0/6.0; *total = *total +1.0/6.0;
*total = *total +1.0/6.0; *total = *total +1.0/6.0;
if (*total > 4000)
*total= *total-98.0;
}
}
Upon running both functions for a various number of input values we obtain the
results shown in Table 4.3, which show an average improvement of 45.336%. This
clearly shows the advantage of converting time consuming for-loops to C++ instead
of running them in their native Matlab code.
4.5
Datasets of Images
A subset of the MPEG7 dataset is taken as reference database to test the above
algorithms. A sample of the MPEG7 dataset is shown below. Note at this stage that
38
the designed algorithms can handle .gif as well as .bmp image files and are invariant
whether the background of the image consists of pixels of ones or zeros.
The used subset in this text is shown below. It consists of 36 classes of images.
The Testing Framework that will be designed in the next section takes each of the
dataset’s images and finds the closest match to it among the images of the dataset.
The optimum would be to classify the images correctly, i.e. to be able to separate
the images of each class.
4.6
Interface Design
In order to facilitate the testing section of the above algorithms, a GUI interface was
designed. The presence of a GUI renders the algorithms more accessible and userfriendly. The diagram in Fig. 4-7 shows how the program works from a top-level
perspective.
The GUI contains the possibility to change the parameters of the fCTHI algorithms permitting to specify the image compressing ratio or the image recovery ration. It also includes the pixel sensitivity for detecting concavities. A Help section
is also present in the GUI to illustrate the main concepts detailed in the previous
sections. A pseudocode description of the GUI is shown in Alg. 4-8.
39
Algorithm 3: Matching Concavity Trees with Holes using Relative Comparison
Algorithm (CTHI)
Input: Two images I1 and I2 containing shapes with Holes
Output: Distance d denoting computed distance between the two trees.
Initialize:
• Set T1 = CTHI {I1 } and T2 = CTHI {I2 }.
• Let DummyHole refer to a dummy node, or a structure representing a
spherical shape with a fixed area.
• Let DummyImage refer to a dummy sub-image or a special concavity tree
made of a unique node.
• Let Weight refer to a weight function that returns the relative area of a hole’s
concavity tree.
function Distance = TEDHI {T1 , T2 }
Distance = 0
SubImagesMax = max{N umberSubImages(T1 ), N umberSubImages(T2 )}
for i ∈ {1...N umberSubImagesM ax} do
if i < N umberSubImagesM ax then
d = d + W eight(Sub − Image[i]) × TEDH{T1 [i], T2 [i]}
else
d = d + W eight(Sub − Image[i]) × TEDH{T1 [i], DummyImage}
Normalize Distance
function Distance = TEDH {T1 , T2 }
Distance = 0
NumberHolesMax = max{N umberHoles(T1 ), N umberHoles(T2 )}
for i ∈ {1...N umberHolesM ax} do
if i < N umberHolesM ax then
d = d + W eight(Hole[i])× TED {T1 [i], T2 [i]}
else
d = d + W eight(Hole[i])× TED {T1 [i], DummyHole}
Normalize Distance
40
Algorithm 4: Matching Concavity Trees with Holes using Random Intersecting
Lines (TEDHIR)
Input: Two images I1 and I2 containing shapes with Holes and a user
parameter κ defining the number of lines
Output: Distance d denoting computed distance between the two trees.
function Distance = TEDHIR {T1 , T2 }
Distance = 0
for i ∈ {1...N umberU serP arameter} do
Generate equations for User Parameter number of random lines.
Dissect I1 and I2 with the generated lines.
T1 (i) = CTHI(I1 )
T2 (i) = CTHI(I2 )
Norm = NormalizeDistance{TEDH {T1 [i], T2 [i]}}
d = d + N orm
Table 4.2: The MEX API Functions. Source: Mathworks (2006), “MEX-files Guide”.
Operation
API Function
Array creation
mxCreateNumericArray, mxCreateCellArray, mxCreateCharArray
Array access
mxGetPr, mxGetPi, mxGetData, mxGetCell
Array modification
mxSetPr, mxSetPi, mxSetData, mxSetField
Memory management
mxMalloc, mxCalloc, mxFree, mexMakeMemoryPersistent,
mexAtExit, mxDestroy Array, memcpy
Table 4.3: Running time comparison: original code versus MEX code
Sentinel
Matlab (s)
MEX (s)
Improvement(%)
900
1000
900
1000
11.30 15.01
44.38 45.635
41
1200
1500
1200
25.73
45.637
1500
49.68
45.693
Figure 4-4: Troubleshooting Methodologies For MEX Files.
42
Figure 4-5: Sample of the MPEG7 dataset.
Figure 4-6: Used Subset of the MPEG7 dataset.
43
Figure 4-7: Interface Top-Level Design.
44
Figure 4-8: Top-Level Image Matcher Program.
45
Chapter 5
Implementation
5.1
Implementation of Sub-Images/Hole Representation
The Concavity Tree Structure and Tree Edit Distance algorithm were updated as to
include representation and matching of bi-level images containing sub-images with
holes. The changes made to these algorithms as well as pseudocodes were already
depicted in Sections 4.3 and 4.4. Next we present some examples showing how subimages as well as holes are being handled.
Example 1: Representation for multi-shape Images Consider the image in Fig. 51. The image contains three sub-images or sub-objects, namely each of the three
buildings. The original tree will be that of the one occupying the most space which
is the double tower building. As no sub-image contains holes, the tree can also be
split into three sub-trees. The tree is shown in Fig. 5-2, where each of the sub-trees
is colored differently.
Note that the main tree is the one to the left representing the major sub-image. The
46
Figure 5-1: Bitmap file representing the shapes of three buildings in a city.
Figure 5-2: Tree representation of the bmp image in Fig. 5-1.
two other sub-images are represented with the sub-trees in red and green. Next the
recovered image from this tree with standard input parameter vector is seen in Fig.
5-3:
Mappings between convex hulls and sub-trees can be seen in Fig. 5-4.
Example 2: Representation for multi-holes Images
Consider the image shown in Fig. 5-5. As can be seen, this image contains two holes.
The original tree will represent the overall shape with no holes whereas the remaining
two sub-trees represent respectively each of the holes. In Fig. 5-6, the sub-tree in
47
Figure 5-3: Recovered image from the tree in Fig. 5-2.
Figure 5-4: Reconstructed sub-shape with corresponding extracted sub-shape.
red represents the bottom hole having two meta-concavities and the sub-tree in green
represents the upper hole having four meta-concavities.
The recovered image from this tree with standard input parameter vector is seen in
Fig. 5-7.
48
Figure 5-5: Bitmap Image representing a shape containing holes.
Figure 5-6: Tree representation of the bmp image in Fig. 5-5.
Figure 5-7: Recovered image from the tree in Fig. 5-6.
Example 3: Representation for multi-holes Images
Consider the image shown in Fig. 5-8.
Mappings between convex hulls and sub-trees can be seen in Fig. 5-10.
49
Figure 5-8: Bitmap Image representing a shape containing two sub-shapes with their
holes.
Figure 5-9: Tree representation of the bmp image in Fig. 5-8.
5.2
Implementation of Relative Tree Edit Distance
Matching Algorithm
Consider the images in Fig. 5-11. Note that the two images are quite identical and
only differ by some discrepancy in the lower part of the main object. As a result, we
expect to have a low distance when matching the two images, say around 0.1 on a
[0 1] scale.
50
Figure 5-10: Reconstructed sub-shape with corresponding extracted sub-shape.
Figure 5-11: Two images to be matched.
The matching algorithm extracts the tree representation for both images (Fig. 5-12).
As can be seen, only one difference exists between the two trees structures. This
corresponds to the cost of one operation in the tree edit graph, which is an Insert
operation to move from T1 to T2 or a Delete operation to move from T2 to T1 . In this
case, and since all other nodes match, the cost of transforming Image1 into Image2
51
Figure 5-12: Corresponding tree representations for the images in Fig. 5-11.
would be the cost of inserting or deleting the concerned node (highlighted in Fig.
5-12). Applying the TEDHI algorithm gives a distance of 0.0847 (around 8%) which
is anticipated according to the above explanation. To illustrate a case where we have
substantial differences between the input images, consider the images shown in Fig.
5-13.
Figure 5-13: Two input images. One representing a bottle, the other a city!
Obviously the two images are very dissimilar in structure as well as shapes. The
first has a unique shape with no holes whereas the second has three sub-shapes, one of
which containing holes. As a result, we expect to have a high distance when matching
the two images, say around 0.7 or 0.8 on a [01] scale. The matching algorithm extracts
the tree representation for both images.
52
Figure 5-14: Corresponding tree representations for the images in Fig. 5-13.
It is highly noticeable that many differences exist between the two trees structures.
Performing the edit distance algorithm corresponds to the cost of several Insert,
Delete, and Substitution costs. In this case, the costs of each operation will add up
and the the cost of transforming the first image into the second would be the sums
of all costs. Applying the TEDHI algorithm gives a distance of 0.813 which is also
anticipated according to the above explanation.
5.3
Implementation of the Random Lines Intersection Algorithm
In order to implement the Dissecting Line idea, a function is written to cut the image
matrix by a number of horizontal, vertical, or positively sloped random lines. The
number of random lines is specified by an input parameter. For each slice we compare
the right portion of the first image to that of the second image, and the left portion
of the first image to that of the second image (in case of a horizontal cut we compare
the upper slices of each image together, and do the same for the lower slices). To
administer the comparison we use the standard TEDHI algorithm to compare the
corresponding portions of each image, after finding the concavity tree representation
for each of the 4 resulting ”pseudo-images”. The costs of each trial of the random
lines are normalized and added to obtain a cost of transforming the first image into
the second image.
53
5.4
Converting for-Loops to C++ Via MEX Files
All time consuming for-loops in the Matlab code were transformed into C++ MEX
files. Since the function that contains the largest number of time consuming for-loops
is the Dijkstra function, it was the first function to be converted to MEX files. The
loops containing complex data structures as in functions CTHI and TEDHI weren’t
converted as this proved to be costly from a running time perspective. All other
for-loops that don’t involve these structures were transformed to MEX. The effort of
rewriting code as MEX files was worth since performance increased drastically upon
testing the running time of the new algorithms, especially for shapes involving many
concavities. More details on performance improvement due to the use of MEX files
will be presented in the next chapter.
5.5
GUI Implementation
As designed in Chapter 4, the Image Concavity Tree Matcher is simply started by
typing ImageConcavityTreeMatcher in the Matlab command window. A prompt then
asks whether or not to rebuild the Database of the testing dataset. Selection is by
default set to No (Fig. 5-15).
Figure 5-15: Initial prompt of the ImageConcavityTreeMatcher.
Upon selecting Yes, the database starts rebuilding. Construction usually takes
around 8 minutes for the selected dataset containing 114 images, an average of 4
seconds for extracting, processing and storing the results of an image.
54
Figure 5-16: Prompts in the GUI for DB Rebuild.
After the DB rebuild, the GUI prompts the user for a program choice:
Figure 5-17: Prompts in the GUI for program choice. ImageMatcher matches an input
image with the structures stored in the database. The proximity program extracts
CT plots, recovers concavities and matches two user-input images.
At this stage, note that the input can be obtained through several other ways according whether we interface the program to input mouse or pad drawing. If we select
input bell-3.gif for example, the GUI proceeds as in Fig. 5-19:
The best match is then posted on the screen as in Fig. 5-20:
55
Figure 5-18: Prompt to File Input.
Figure 5-19: Image Matching.
Figure 5-20: Best Match Return.
The GUI then asks whether we would like to input another letter (Fig. 5-21):
56
Figure 5-21: Recursion Start.
Selecting Yes will start a recursion in the ImageMatcher. Upon clicking No, the initial
prompt is back (Fig. 5-22):
Figure 5-22: Back to the User Choice Input.
Choosing Proximity opens the Proximity program. The splash and the initial screens
are shown in Fig. 5-22:
The corresponding concavity trees are shown in Fig. 5-26:
This GUI was used in order to test the algorithms over the subset of the MPEG7
dataset. Testing results will be exposed in the next chapter.
57
Figure 5-23: Welcome screen of the Proximity software.
5.6
Applying Concavity Trees in the Illiteracy Project
The purpose of the Illiteracy project is to design and implement an efficient method
for helping analphabets in the Arab world overcome Illiteracy problems. One way
to do so would be to recognize and correct the handwriting of analphabet students.
This can be implemented by applying the concavity trees concept.
The Arabic alphabet consists of 28 basic letters, some of which admit customized
forms as the aleph for example which admits the forms shown in fig. 5-29:
In what follows, and for neatness purposes, the standard 28 letters representation for
the Arabic alphabet will be used, knowing that the method described can be also
applied for the remaining characters. As a result, the database used can be seen in
Fig. 5-30:
58
Figure 5-24: Initial screen of the Proximity software.
The algorithms were tested against a variety of input datasets. A sample input
dataset can be seen in Fig. 5-31:
Example 1: Loading figure ”input ye” from the above input dataset into the GUI
gives the image preview shown in Fig. 5-32. The image is contrasted against the
reference in Fig. 5-33:
The output of the CTHI function is shown in Fig. 5-34:
59
Figure 5-25: Loaded Images to the Proximity software.
The blue nodes correspond to the concavities and meta-concavities of the contour
of the body of the input, i.e. the
ø shape, while the red node represents the unique
concavity for the - shape under the
ø
shape. Remaining nodes correspond to the
concavities and meta-concavities of the contour of the holes inside the image. This
is illustrated in Fig. 5-35, where we show the correspondence between the holes and
their representation in the tree.
Example 2: An example for a letter with holes is the ”sad”, shown in Fig. 5-36:
The convex hull reconstruction for the letter is shown in Fig. 5-37 and its tree
representation in Fig. 5-38 :
60
Figure 5-26: Corresponding concavity trees representation of the input images shown
in Fig. 5-25.
As can be noted, the single hole is represented by the red node in the tree. Running
the Relative Tree Edit Distance Matching Algorithm on the input letter ye gives a
distance of 0.0617 as can be specified in the below screenshot. For the algorithm to
function properly, this value should be the least among the 28 distance values to each
of the alphabet letters.
Fig. 5-40 displays the array corresponding to distances of input “ye” to each of the
alphabet’s letters. It can be seen that min(Dist) = Dist[28] which corresponds to
the value of the last letter “ye”.
61
Figure 5-27: Corresponding convex hulls retrieved for the input images shown in Fig.
5-25.
62
Figure 5-28: Matching results for images in Fig. 5-25. The difference between the
two consists mainly in the upper sub-object amounting for around 30% difference in
figures, which is close to the obtained distance (0.28876).
Figure 5-29: Different forms for “aleph” in the Arabic alphabet.
63
Figure 5-30: Arabic Alphabet. It consists of 28 letters. It is the main dataset to be
used in the Illiteracy project.
Figure 5-31: Sample Input Dataset.
64
Figure 5-32: Loading Input versus Reference Images.
Figure 5-33: Convex hull representation of the input images in Fig. 5-32.
Figure 5-34: Concavity tree of the original image in Fig. 5-32. In red, we can see the
representation of the dash, considered as a sub-image within the image.
65
Figure 5-35: Correspondence between objects and tree nodes.
Figure 5-36: Sample letter with a hole.
Figure 5-37: Convex hull representation of the input image in Fig. 5-36.
66
Figure 5-38: Concavity tree representation of the input image in Fig. 5-36.
Figure 5-39: Matching result of input letter “ye” and the reference one.
67
Figure 5-40: Array of distances of input “ye” to each of the alphabet’s letters.
68
Chapter 6
Testing and Assessment
6.1
Testing the Algorithms for Image Matching
In order to test the algorithms, we ran 36 classes, each of eight images and recorded
how many of the images are matched correctly to the reference database. This was
repeated for two more times after making minor modifications in the images to make
sure the results are stable. The results ranged between 72% and 98% with an average
of 86.8% for the Relative Tree Edit Distance Algorithm and 86.05% for the Random
Lines Intersection Algorithm.
Results can be seen in Table 6.1:
A graph displaying the results of the TEDHI algorithm can be seen in Fig. 6-1:
Hence, we can say that the matching algorithms achieved a success rate of around
86% over the dataset. The below histogram shows that, in general, classes of images have different distance ranges from a given class of images. For example, although images for classes “Bottle” and “Cellular” are somehow similar, the range
69
Figure 6-1: Percent matching for different sets of Input images.
Figure 6-2: Example for a given class.
of the eight closest images to class “Brick” from class “Bottle” is different from the
range of the eight closest images to class ”brick” from class “Cellular”. Defining A
to be the range of the “Bottle” class and B that of the “Cellular” class, we have
Range(A) = [0.064 0.123] and Range(B) = [0.214 0.312].
Since A ∩ B = Φ (empty set), there is no overlapping in the ranges and consequently neat separation between both images classes. The optimum case would be to
70
have non overlapping ranges for all combinations of image classes which would result
in a 100% matching rate for the given class. Obviously this is difficult because of
the high number of classes. An illustration of this case can be seen in the histogram
shown in Fig. 6-3.
Figure 6-3: Classes “Bottle” and “Cellular” are linearly separable since their distance
ranges do not intersect.
On the other side, some image classes can interfere with others. This is the case
of classes “Car1” and “Car2”. An illustration to this case is shown in Fig. 6-4 where
the reference class towards with which distances are computed is class “Face”.
6.2
Testing the Running Time
Upon incorporating MEX Files into the algorithms, the running time was reduced by
a factor of 38%. This is considerably important since, and as already mentioned it
71
Figure 6-4: Classes car1 and car2 are not linearly separable since their distance ranges
do intersect.
reduced the database rebuilding time to 8 minutes from around 12 minutes.
6.2.1
Complexity of the CTHI algorithm
Let n = Number of Contour Pixels of all Sub-Shapes.
Let h = Height of the Concavity Tree.
Let γ = Number of Sub-Images.
Let d = Number of Holes in all Sub-Shapes.
Let α = Number of Concavity Tree Nodes.
Let β = User Input Parameter for the Random Lines Intersection Algorithm.
Since the Running Time of the Concavity Tree Algorithm is O(nh) , we can write
the running time of the CTHI function as:
T (n) = O(γdnh) = δ 2 O(nh) where δ = O(1)
72
(6.1)
In the worst case, we have:
h=
n
where k > 4 for the smallest concavity to be considered as node
k
(6.2)
In this case, the running time becomes:
δ2
n
T (n) = O(γdnh) = δ 2 O(n ) = O(n2 )
k
k
For other cases, h varies according to k.2h ≈ n =⇒ h ≈ log2
(6.3)
n
k
.
For which the running time becomes:
n
T1 (n) = O(γdnh) = δ 2 O(n log2 ) = δ 2 O(n log2 n)
k
(6.4)
For an image with one shape and holes (assuming h ≈ log2 nk ), the running time
for CTHI becomes:
n
T1 (n) = O(dnh) = δ.O(n log2 ) = δ.O(n log2 n)
k
(6.5)
This is the same expected running time for an image with multi-shapes but no
holes.
6.2.2
Complexity of the TEDHI algorithm
Here the Tree Edit Distance process is executed for dn runs, that is why the running
time for TEDHI is expressed as:
73
T2 (n) = γ.d.TDijkstra = γ.dO(E + V log V )
= γ.d.O(α2 {3 + 2 log α})
n 2
= O log2 α {3 + 2 log α}
k
n
= O γ log2 α2 {3 + 2 log α}
k
2
= O 3γα log α log2 n
= O 3γα3 log2 n
6.2.3
Complexity of the TEDHIR algorithm
The Tree Edit Distance process is executed for β runs when the TEDHIR algorithm
executes, hence:
T3 (n) = O(βα2 {3 + 2 log α}) = O(3βα2 log α)
6.2.4
(6.6)
Complexity of the General Case
As a result, we can express the running time of the procedure that represents and
matches two images as follows:
T (n) = T1 (n) + max{T2 (n), T3 (n)}
= δO(n log2 n) + O(3γα3 log2 n)
= O(n log n)
Therefore the overall complexity is polynomial in time. However, the running time
of the matching algorithms is much less than this since they depend on the number
of nodes in a tree which is considerably less than that of the contour pixels.
74
6.3
Results and Analysis
Results can be seen in Table 6.1:
To assess the matching performance over the datasets and any images, we can say:
• Concavity trees yield excellent results over most of the dataset’s classes. Concavity
trees have several competitive advantages over other methods, particularly in that
they are:
– Scale-invariant: The distance of an image to any scaled version of it is zero.
– Rotation-invariant: The distance of an image to any rotated version of it is zero.
– Shape-oriented: The estimated distance between two images similar in shapes
but dissimilar in size is most likely to be small.
• Average results were obtained over some classes. This leads to raising the drawbacks
of using CTs:
– Since CTs are shape oriented and invariant under scaling, two dissimilar entities
represented by similar shapes with tiny differences are difficult to catch since
tiny differences would lead to tiny matching distances.
– Although holes and sub-shapes are represented differently in the CT data structure, they both contribute to matching which can affect the matching distance
negatively. To illustrate this, an image with one sub-shape and two holes may
have a close distance to an image with two sub-shapes and no holes.
It is important to mention as well that retrieving images was shown to be reliable
and stable for any image, independently of its size, shape and structure.
75
6.4
Comparison with other methods
Comparing the upgraded concavity tree method for extraction and matching with
other methods reveals its strength and advantages. Comparing with the conformal
mapping method proposed by Badawy and Kamel (2004), it can be seen that the
current algorithm’s performance exceeds the one described in the paper by around
5%.
Further elaborating on this point, it is interesting to say that the best matching
rates achieved over binary images with no holes or sub-shapes reach around 91 or
92% over the MPEG7 dataset. The proposed algorithms achieve a rate of 87% for
more complex types of images.
76
Table 6.1: Success rate for each class of the synthetic dataset
Class
TEDHI Success Rate (%)
TEDHIR Success Rate (%)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35 (36)
89
87
90
91
88
87 ≈ µ
86
90
86
85
84
87
89
92
87
83
74
91
85
98
80
84
85
79
92
90
83
98
82
84
76
89
90
87
86 (92)
82
81
94
82
81
86
86 ≈ µ
92
88
81
87
75
72
96
85
89
90
91
80
83
85
84
89
91
88
90
82
95
80
89
78
93
91
90
82 (90)
77
Table 6.2: Table showing success rates per image class with comments interpreting
the performance the algorithms
Class
Success Rate (%)
Description
apple1
apple2
bell
bottle
brick
camel
car1
car2
carriage
cellular phone
cellular
cup
children
chopper
classic
device3
device4
device5
device8
face
fork
fountain
hammer
jar
key
octopus
pencil
personal car
rat
sea snake
shoe
spoon
spring
stef
teddy
watch
80
81
90
91
88
87 ≈ µ
86
90
86
85
84
87
89
92
87
83
78
91
93
91
80
84
85
79
92
90
83
98
82
89
76
89
90
87
86
95
Similarity with apple2
Similarity with apple1
Similarity with bell
Resemblance with hammer as CTs have same structure
Similarity with face as CT is rotation invariant
Similarity with hammer
Confusion with key as CT is size invariant
78
Chapter 7
Further Research
In this thesis, the Concavity Tree concept has been upgraded to cover images containing multi-shape objects with holes. Algorithms for extracting and plotting the new
concavity trees in an efficient and performant fashion have been implemented. Further, two alternatives of image matching have been introduced and proved to achieve
high success rates over subsets of the MPEG7. It was shown how to apply the concept
on real applications such as the Illiteracy project. A GUI was designed to illustrate
the concepts introduced and a testing framework was implemented to test the success
rates of the images included in the designed datasets.
The research area targetted in this thesis is one of the research fields that are still
prone to much efforts and investigation. The main prospective improvement that
can be thought of is the extension of this concept to colored images. Colored images
can be transformed to binary images but this obviously involves a loss of information
proportional to the type of images. Actually, and as this seems feasible for grayscale
or RGB images, it exhibits many problems when trying to apply it for other colored
images types.
79
Bibliography
Badawy, O. E. and Kamel, M. (2004). Matching concavity trees. Proceedings of
the Joint IAPR International Workshops on Structural, Syntactic, and Statistical
Pattern Recognition, SSPR 2004 and SPR 2004, Lisbon, Portugal, 3138:556–564.
Badawy, O. E. and Kamel, M. (2005). Compressing 2-d shapes using concavity trees.
pages 559–566.
Batchelor, B. (1980a). Hierarchical shape description based upon convex hulls of
concavities. Cybernetics and Systems, 10(1-3):205–210.
Batchelor, B. (1980b). Shape descriptors for labeling concavity trees. Cybernetics
and Systems, 10(1-3):233–237.
Borgefors, G. and di Baja, G. S. (1992). Methods for hierarchical analysis of concavities. Pattern Recognition, Proceedings of the 11th IAPR International Conference
on Image, Speech and Signal Analysis, III(1):171–175.
Borgefors, G. and di Baja, G. S. (1996). Analyzing nonconvex 2d and 3d patterns.
Computer Vision and Image Understanding, 63(1):145–157.
Fadlallah, B., Hayek, H., Badawy, O. E., and Kamel, M. (2005). Matching concavity
trees using tree edit distance. Technical Report, Pattern Analysis and Machine
Intelligence Laboratory, University of Waterloo.
Mathworks (2006a). The components of a c mex-file.
Mathworks (2006b). Mex-files guide.
Sklansky, J. (1972). Measuring concavity on a rectangular mosaic. IEEE Trans.
Comput., 21(12):1355–1364.
Valiente, G. (2002). Algorithms on Trees and Graphs. Springer.
Xu, J. (1997). Hierarchical representation of 2-d shapes using convex polygons: A
morphological approach. Pattern Recogn. Lett., 18(10):1009–1017.
80