Download Isograph: Neighbourhood Graph Construction Based On Geodesic Distance For Semi-Supervised Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational phylogenetics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Algorithm characterizations wikipedia , lookup

Algorithm wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Graph coloring wikipedia , lookup

Travelling salesman problem wikipedia , lookup

Signal-flow graph wikipedia , lookup

Dijkstra's algorithm wikipedia , lookup

Transcript
Isograph: Neighbourhood Graph Construction Based on Geodesic Distance for
Semi-Supervised Learning
Marjan Ghazvininejad, Mostafa Mahdieh, Hamid R. Rabiee, Parisa Khanipour Roshan and Mohammad Hossein Rohban
DML Research Lab, Department of Computer Engineering, Sharif University of Technology
Tehran, Iran
Contact Email: [email protected]
Abstractβ€”Semi-supervised learning based on manifolds has
been the focus of extensive research in recent years. Convenient
neighbourhood graph construction is a key component of
a successful semi-supervised classification method. Previous
graph construction methods fail when there are pairs of data
points that have small Euclidean distance, but are far apart
over the manifold. To overcome this problem, we start with
an arbitrary neighbourhood graph and iteratively update the
edge weights by using the estimates of the geodesic distances
between points. Moreover, we provide theoretical bounds on the
values of estimated geodesic distances. Experimental results on
real-world data show significant improvement compared to the
previous graph construction methods.
Keywords- Semi-supervised Learning, Manifold, Geodesic
distance, Graph Construction
I. I NTRODUCTION
A. Semi-supervised Learning
The costly and time consuming process of data labeling,
and the amount of relatively cheap unlabeled data at hand,
are two elements of real-world applications that have caused
a recent interest in applying Semi-Supervised learning (SSL)
methods. Text and image mining are common examples of
applications in which SSL plays an important role ([1], [2],
[3]). SSL methods utilize both labeled and unlabeled data
to improve the generalization ability of the learner in such
applications. Using the unlabeled data, one can calculate the
distribution of data in feature space, which can extremely
improve the classification.
In order to use the unlabeled data for label inference more
efficiently, certain assumptions should be made about the
general geometric properties of the data. In many applications, high dimensional data points are actually samples
from a low-dimensional subspace of the actual feature space.
In these cases, we can make use of the Manifold/Cluster
assumption which is among the most practical assumptions
in SSL [4]. Manifold/Cluster assumption is held in many real
world datasets in general, and image datasets in particular
[5].
Weighted discrete graphs are a suitable representation of
manifolds. A manifold can be represented by sampling a
finite number of points from the manifold as the graph
vertices and putting edges between nearby points on the
manifold. As the underlying manifold is unknown in real
data and smoothness estimation of the labeling function
heavily relies on the manifold model, graph construction
plays an important role in this problem. Therefore, several
methods have been proposed to construct a suitable graph
representing the manifold. These graph construction methods, output a weighted graph in which each data point is
vertex and edge weights illustrate the amount of distance
between ending points. The constructed graph is then used to
infer labels of unlabeled data points. Therefore, appropriate
construction of the neighbourhood graph plays a key role in
manifold based SSL. This argument is further discussed in
Subsection II-C.
Some recent work in Semi-Supervised Learning literature
has focused on proposing graph construction methods that
best represent the manifold structure. π‘˜-NN and πœ–-ball are
two classical methods of graph construction [6]. Several
schemes have been proposed to improve the π‘˜-NN graph
construction method. Moreover, Jebara et al. proposed the
𝑏-matching algorithm [7] which unlike the π‘˜-NN method,
produces a balanced graph, i.e. all nodes have the same
number of neighbours. The effectiveness of this method
has been corroborated by both theoretical and experimental
justifications [8].
However, quite a lot of graph construction methods have
utilized the Euclidean distance of data points as a measure for evaluating distance. Unfortunately, this approach
is misleading at times, since two points having a small
Euclidean distance may be situated far apart on the manifold.
In this case two points are connected while in fact they
are distant from each other on the manifold. Such edges
are called shortcut edges [9]. In this case the graph does
not represent the manifold structure correctly. This situation
can be prevented using a distance measure that reflects the
distance of data points more efficiently. Approaching the
correct distance of data points enables us to determine the
neighbourhood of a data point more precisely.
Cukierski et. al. have proposed a method to identify
shortcut edges via the Betweenness Centrality measure[9].
Betweenness Centrality is related to the number of graph
shortest paths (between any two vertices) that pass through
a specific edge. They intuitively argue that the shortcut edges
are probable to have high Betweenness Centrality, and used
this fact to remove these edges. However, to the best of our
knowledge, this argument is not justified theoretically.
In this paper, we introduce a novel algorithm to determine
shortcut edges and adjust their weights with the aim of
approaching the real distance of the points on the manifold. The graph constructed by this algorithm is based on
the intrinsic distance between points, hence we named it
Isograph. We provide a solid theoretical foundation for our
work, and actually our algorithm comes out from the theorems. Experiments on benchmark datasets show promising
results compared to the previous state-of-the-art work. It
is noticeable that our method can be initialized from any
arbitrary graph, therefore it can be easily combined with
many previous graph construction methods.
Finally,f note that the famous Isomap, tries to estimate the
geodesic distance between the points and the result of these
methods are considerable [10]. However Isomap can not
find the geodesic distance between near points properly. It
concentrates on finding geodesic distance between far points,
therefore can not used in graph construction methods.
The remaining of the paper is organized as follows.
In section II we introduce the notations used throughout
the paper and provide basic definitions in this field. This
section can be safely skipped by the experienced reader.
Next in SectionIII and IV we explain the motivations of our
algorithm and the basic idea of geodesic distance. This is
followed by the precise problem setting in Section V. In section VI we present our proposed method and give theoretical
justifications for it. Finally in section VII, the experimental
results of applying our method on both synthetic and real
world datasets are presented.
Figure 1. Four of five given data points are labeled. The goal is to predict
the label of the unlabeled data point A) Without any prior knowledge, B)
Knowing the Manifold/Cluster assumption [4].
prediction is class -, as the two points near the unknown
point are from this class. Now suppose we somehow know
that the data points are only distributed on the curve shown
in Figure 1 B and expect the labels of adjacent points on
this curve to be similar. In this new setting the label + is a
better candidate, as the two adjacent points of the unlabeled
point are both from class +. The assumption just mentioned
is called Manifold/Cluster assumption and is generalized to
π‘š-dimensional spaces. The Manifold/Cluster assumption in
fact consists of two parts: The Manifold assumption and
the Cluster assumption. The Manifold assumption states that
data points lie on a 𝑑-dimensional manifold (denoted by β„³)
in the π‘š-dimensional feature space (𝑑 β‰ͺ π‘š). The Cluster
assumption states that labels of the points vary smoothly on
β„³. We will use the term β€œManifold assumption” instead of
Manifold/Cluster assumption in the rest of this paper.
Suppose 𝑓 is a labeling function on β„³ .i.e a function
from β„³ to ℝ. The smoothness of 𝑓 is formally defined as
[11]:
∫
βˆ₯▽𝑓 βˆ₯2 𝑑𝑉ℳ
(1)
𝑆(𝑓 ) =
β„³
II. BASICS AND N OTATIONS
A. Classification Setting
Consider the set of possible classes {βˆ’1, +1}, and let
the feature space be the π‘š-dimensional real-valued space:
β„π‘š . We denote the labeled data as 𝑋𝐿 = {x1 , . . . , x𝑙 } with
corresponding labels y = (𝑦1 , . . . , 𝑦𝑙 ), where x𝑖 ∈ β„π‘š and
𝑦𝑖 ∈ {βˆ’1, +1}. The unlabeled dataset is given as π‘‹π‘ˆ =
{x𝑙+1 , . . . , x𝑙+𝑒 } where x𝑖 ∈ β„π‘š .
In this setting 𝑋𝐿 , π‘‹π‘ˆ and y are given and the goal
is to find an estimation of the labels of the data f =
(𝑓1 , . . . , 𝑓𝑙+𝑒 ) for all the points. The value of 𝑓𝑖 is a real
value between -1 and 1 and bigger values of 𝑓𝑖 correspond
to more membership in class +1. Practically these values
are mapped to {βˆ’1, +1} after inference is done. This
classification problem can be generalized to the case of
possible classes {1, 2, . . . , 𝑐} where 𝑐 β‰₯ 2 using the oneagainst-all method.
B. Manifold/Cluster assumption
In Figure 1 A the goal is to predict the label of the
unlabeled point. Without any other prior knowledge the best
Actually 𝑆(𝑓 ) captures the concept of roughness instead
of smoothness, but we will name it smoothness as previous
authors have done. The Manifold assumption also states that
𝑆(𝑓 ) must have a small value.
C. Neighbourhood graph
In SSL algorithms, we should use a discrete representation of the manifold, as we only have a finite number
of data points. Therefore graphs are suitable representation
for manifolds. To build a graph from the data points given,
we consider one vertex for each data point and add edges
between data points that are adjacent on the manifold, hence
we call it neighbourhood graph. If the underlying manifold
is known, constructing such a graph is straightforward.
The challenging problem occurs when we do not have the
manifold, which is the case in real world problems. This
problem is called β€œgraph construction”.
An example of a complex manifold together with the
neighbourhood graph constructed on the manifold is shown
in Figure 2.
construction, when tested on the digit recognition task and
text classification. In the experimental results section, we
will show that Isograph can improve the graph generated by
𝑏-matching.
D. Label Inference
Figure 2. A curved 2D manifold in the 3D feature space. The data points
are shown as black dots, and the neighbourhood graph edges are shown as
lines connecting data points [12].
We denote the neighbourhood graph constructed by 𝐺 =
(𝑉, 𝐸), where 𝑉 = 𝑉 (𝐺) is the set of vertices of the graph
and 𝐸 = 𝐸(𝐺) is the set of edges of the graph. Each
edge on the graph represents a neighbourhood relationship
and the edge weights are the distances of the corresponding
endpoints. The weight of edge 𝑒 = (𝑒, 𝑣) is denoted by 𝑑(𝑒)
or 𝑑(𝑒, 𝑣) throughout the paper. For simplicity we assume
𝑑(𝑒, 𝑣) = ∞ if no edge exists between 𝑒 and 𝑣 in 𝐺.
We choose the neighbours of each vertex and the weights
of edges such that should approximate the manifold structure. Several methods have been proposed for graph construction, which have tried to present appropriate approximations of the manifold structure. We introduce some of
these methods in the following.
1) Classical graph construction methods: π‘˜-NN and πœ–ball are two classical methods of graph construction [6]. In
the π‘˜-NN graph construction method, each data point in the
graph is connected to the π‘˜ nearest neighbours of the data
points, and the weights are the euclidean distance between
the endpoints. As we always put the reverse edges in the
graph to make the graph symmetric, The degree of some of
the vertices might get much greater than π‘˜.
In the πœ–-ball method, each data point is connected to data
points which the distance between them are less than πœ– and
the weight of that edge is equal to the distance between
endpoints. There is no constraint for the degree of vertices
in this method. If πœ– is too small the resulting graph will
be too sparse and a big πœ– will result too many non-relevant
edges, therefore finding a suitable πœ– is a hard task. Hence
π‘˜-NN is used more frequent in practice.
2) 𝑏-matching: 𝑏-matching is a well-known state-of-theart graph construction method that has experienced active
research in the recent years [8], [7]. 𝑏-matching creates a
balanced graph which has equal degree π‘˜ for all vertices.
This method works well when samples are distributed nonuniformly in the feature space.
Theoretical foundations for this method have been presented. This method is reported to improve the π‘˜-NN graph
We have used distances as edge weights but another near
concept, namely similarity, is needed for semi-supervised
label inference. Similarity is the converse of distance; when
distance between two data points is low there similarity is
high and vice versa. Similarity can be derived from distance
in few ways, among them is Gaussian similarity. Let π‘Š be
the similarity matrix corresponding to graph 𝐺, that is π‘Šπ‘–π‘—
is the similarity between vertices 𝑖 and 𝑗. Then Gaussian
similarity is defined as
𝑑(𝑖, 𝑗)2
)
𝜎2
We mentioned that smoothness is defined as:
∫
βˆ₯▽𝑓 βˆ₯2 𝑑𝑉ℳ
𝑆(𝑓 ) =
π‘Šπ‘–π‘— = exp(βˆ’
(2)
β„³
As we just have finite number of points on the manifold
and need to infer 𝑓 just on these points, we can approximate
smoothness restricted to these points as [11]:
Λ† )=
𝑆(f
𝑙+𝑒
βˆ‘
W𝑖𝑗 (𝑓𝑖 βˆ’ 𝑓𝑗 )2
(3)
𝑖,𝑗=1
The label inference process is based on finding an 𝑓 which
minimizes a mixture of both 𝑆(𝑓 ) and the error of 𝑓 on the
labeled data.
Λ† ) can be written in the following
It is easy to show that 𝑆(f
quadratic form:
Λ† ) = f ⊀ Lf
𝑆(f
(4)
with L = Dβˆ‘βˆ’ W, where D is the diagonal degree matrix
𝑙+𝑒
(i.e. D𝑖𝑖 = 𝑗=1 W𝑖𝑗 ). L is known as the graph Laplacian.
The inference minimization problem is formally defined
in the following form.
f βˆ— = min βˆ₯Cf βˆ’ yβˆ₯2 + 𝛾f ⊀ Lf
(5)
f
)
(
C = I𝑙×𝑙 0𝑙×𝑒 is a selection matrix, .i.e Cf only has
the labeled indexes of f , therefore βˆ₯Cf βˆ’ yβˆ₯2 represents the
difference between y and f . An algorithm of running time
𝑂(𝑛3 ) can compute the solution to this equation, where 𝑛
is the number of data points.
III. M OTIVATION
As previously mentioned, shortcut edges connect those
points of the graph which are close to each other according
to the Euclidean distance, but have large geodesic distance
on the manifold. An example of such edges and the underlying manifold is shown in Figure 3. These edges may
be disastrous to the label inference process. According to
u
v
Figure 3. Part of a one-dimensional manifold showing the shortcut edge
between 𝑒 and 𝑣
is a popular assumption in Semi-Supervised Learning and
the sampling condition is a reasonable condition which is
common in the manifold learning literature [14].
In our problem setting, we have the following assumptions:
1) The data points lie on a 𝑑-dimensional manifold,
denoted by β„³.
2) Sampling condition: The manifold β„³ is sampled as
follows: There exists 𝛿 ∈ ℝ such that for any point
𝑝 ∈ β„³, there exists a data point π‘ž in the labeled or
unlabeled data points, such that 𝑑ℳ (𝑝, π‘ž) ≀ 𝛿. We
refer to the least such 𝛿 as 𝛿(β„³). 1
VI. P ROPOSED M ETHOD
Figure 4.
Geodesic curve between two points on a manifold [13]
the Manifold assumption, we expect close data points on
the manifold to have similar labels. This condition may be
violated in the case of shortcut edges, since the adjacent
data points are actually far from each other on the manifold.
Therefore, it is crucial to find such edges and reduce their
impact on the inference process.
We expect a graph which has fewer shortcut edges to perform better in classification, therefore shortcut edge detection is a key problem in neighbourhood graph construction.
In fact this paper aims at detecting such edges and removing
them or adjusting their weights in an appropriate manner.
In this section, we want to pass down the intuition of the
proposed algorithm with a basic algorithm. Then we add
more details and practical modification to this and introduce
our final algorithm. One major improvement of the final
algorithm over the baseline is that it adjusts the weights
of shortcut edges, instead of naively removing all edges
suspicious to being shortcut.
A. The baseline algorithm
V. P ROBLEM S ETTING
In the baseline algorithm, we mainly try to detect the
shortcut edges. An edge (𝑒, 𝑣) from the neighbourhood
graph is a shortcut edge if and only if 𝑑(𝑒, 𝑣) β‰ͺ 𝑑ℳ (𝑒, 𝑣).
Looking back at Figure 3, we can observe that an important feature of edge (𝑒, 𝑣) -which is a shortcut edge- is that
when we remove this edge, the shortest path between 𝑒 and
𝑣 which only contains small edges, must nearly pass through
the curved manifold, therefore in fact this path has a lot of
edges. This is the key intuition to our algorithm, which we
explain more precisely in the following.
Suppose we start with an initial graph 𝐺, achieved by any
graph construction method. For any edge 𝑒 = (𝑒, 𝑣) ∈ 𝐸(𝐺)
with weight 𝑑, we consider the subgraph containing edges
with weights less than 𝑑 (the small edges previously mentioned). Assume that the shortest path between 𝑒 and 𝑣
be a long path in this subgraph. All edges on this path
have smaller weights than the edge 𝑒, therefore we expect
this path to better represent the geodesic curve between 𝑒
and 𝑣 compared to edge 𝑒. As a result, the estimation of
geodesic distance may be better achieved using this path.
If the number of edges in such a path is big enough, e.g.
bigger than two, it is probable that (𝑒, 𝑣) is connecting points
that may be far on the manifold, and therefore (𝑒, 𝑣) is
probably a shortcut edge. In the following, we prove that
the threshold length of two is an appropriate measure for
detecting shortcuts.
This procedure does not perform for the edges of Minimum Spanning Tree (MST) of the initial graph of 𝐺
In this section, we introduce the assumptions which we
have based our algorithms on. These contain the Manifold
assumption and a sampling condition. Manifold assumption
1 β„³ is assumed to be bounded. This is reasonable because usally in a
machine representation the feature space is finite and β„³ is a subset of the
feature space.
IV. G EODESIC D ISTANCE
In the plane, the shortest path between two points is the
straight line connecting them, but in general manifolds such
as sphere, this line does not lie on the manifold. Therefore,
we need a new concept to define distance between points on
manifolds.
Geodesic curves are curves lying on the manifold connecting points with the shortest path (Figure 4).
Definition 1. For any two points 𝑝 and π‘ž on the manifold
β„³, we define 𝑑ℳ (𝑝, π‘ž) as the length of the shortest curve
between 𝑝 and π‘ž lying on β„³.
Proposition 1. For any 𝑝, π‘ž ∈ β„³: 𝑑(𝑝, π‘ž) ≀ 𝑑ℳ (𝑝, π‘ž),
where 𝑑(𝑒, 𝑣) is the metric in the ambient space
This is intuitively clear, but can be proven rigorously using
straight line segments for length estimation.
(MST(𝐺)), because we claim that preserving edges MST(𝐺)
is necessary for graph construction in proposed algorithm.
A disconnected graph is disastrous to the process of label
inference. To ensure the connectivity of 𝐺 we do not remove
any of the edges in 𝑀 𝑆𝑇 (𝐺). 𝑀 𝑆𝑇 (𝐺) is chosen to prefer
smaller edges as they are less probable to be shortcut edges.
Require: An initial graph 𝐺 built with a graph construction
method (e.g. π‘˜-NN)
Ensure: Shortcuts of graph 𝐺 are removed
1: Let 𝐺𝑓 be the full graph on the sampling, i.e. the graph
which contains edges 𝑒 = (𝑒, 𝑣) for all 𝑒, 𝑣 ∈ 𝑉 (𝐺)
2: for all 𝑒 = (𝑒, 𝑣) ∈ 𝐸(𝐺) βˆ– 𝐸(𝑀 𝑆𝑇 (𝐺)) in ascending
order of distance do
3:
𝐺𝑒,𝑣 ← the subgraph of 𝐺𝑓 with edge weights less
than 𝑑(𝑒, 𝑣)
4:
𝑙 ← length of shortest path in 𝐺𝑒,𝑣 between 𝑒 and 𝑣
5:
if 𝑙 > 2 then
6:
Remove edge 𝑒 from 𝐸(𝐺)
7:
end if
8: end for
Algorithm 1: The baseline algorithm
To justify the correctness of our algorithm, we should
show that the baseline algorithm preserves an edge (𝑒, 𝑣) ∈
𝐸(𝐺) if 𝑑(𝑒, 𝑣) is close enough to 𝑑ℳ (𝑒, 𝑣), and removes
it otherwise.
We already know from Proposition 1 that 𝑑(𝑒, 𝑣) ≀
𝑑ℳ (𝑒, 𝑣) is always held. In the following theorems, we first
justify that if 𝑑ℳ (𝑒, 𝑣) is not too larger than 𝑑(𝑒, 𝑣), the
edge is not removed by the baseline algorithm.
Theorem 1. If 𝑑ℳ (𝑒, 𝑣) < 2𝑑(𝑒, 𝑣) βˆ’ 2𝛿(β„³), where 𝛿(β„³)
is defined in Definition 1, then the baseline algorithm will
preserve edge (𝑒, 𝑣).
Proof: This theorem is a special case of Theorem 3,
which will be proved in Appendix A.
In order to complete the justifications, we further show
that if 𝑑ℳ (𝑒, 𝑣) is much larger than 𝑑(𝑒, 𝑣), edge (𝑒, 𝑣)
will be removed by the baseline algorithm. To do so, we
need to define some concepts first.
Definition 2.
1) Consider all unit-speed geodesic curves 𝐢 completely
lying on β„³. The minimum radius of curvature π‘Ÿ0 =
π‘Ÿ0 (β„³) is defined by
1
¨ βˆ₯}
= max{βˆ₯ 𝐢(𝑑)
𝐢,𝑑
π‘Ÿ0
¨ represents the second derivation of 𝐢 with
where 𝐢(𝑑)
respect to 𝑑 [14].
2) The minimum branch separation 𝑠0 = 𝑠0 (β„³) is
defined as the largest positive number for which,
u
v
w
Figure 5. The geodesic path between two endpoints of edge 𝑒 = (𝑒, 𝑣)
is showed by the dashed line. The geodesic paths between pairs 𝑒, 𝑀 and
𝑀, 𝑣 are shown by solid curves.
𝑑(π‘₯, 𝑦) < 𝑠0 implies that 𝑑ℳ (π‘₯, 𝑦) ≀ πœ‹π‘Ÿ0 , for
every π‘₯, 𝑦 ∈ β„³, where π‘Ÿ0 is the minimum radius of
curvature [14].
Definition 3. Manifold β„³ is called geodesically convex if
there exists a Mathematically geodesic curve 𝐢 between any
two arbitrary points π‘₯, 𝑦 ∈ β„³ with the length 𝑑ℳ (π‘₯, 𝑦)
[14]. A Mathematically geodesic curve 𝐢 on manifold β„³
is a curve where the geodesic curvature is zero on all points
of the curve [15].
This condition is just needed for next theorem.
Theorem 2. If β„³ is a geodesically convex manifold and
there exist 𝑒, 𝑣 ∈ β„³ where 𝑑(𝑒, 𝑣) < 𝑠0 and 𝑑ℳ (𝑒, 𝑣) β‰₯
2
1βˆ’πœ†0 𝑑(𝑒, 𝑣), then the baseline algorithm removes edge 𝑒 =
(𝑒, 𝑣), where πœ†0 is a constant for a given manifold β„³ and
2
(β„³)2
can be computed by πœ†0 = πœ‹96π‘Ÿπ‘ 00(β„³)
2 .
Proof: Suppose the baseline algorithm does not remove
edge 𝑒. Therefore, according to the baseline algorithm, the
length of the shortest path between 𝑒 and 𝑣 in 𝐺𝑒,𝑣 equals to
two (It can not be one because we omit (𝑒, 𝑣)) and hence,
there exist edges 𝑒1 = (𝑒, 𝑀) and 𝑒2 = (𝑀, 𝑣) such that
𝑑(𝑒, 𝑀) < 𝑑(𝑒, 𝑣) and 𝑑(𝑀, 𝑣) < 𝑑(𝑒, 𝑣) (Figure 5).
From [14], we know that for any arbitrary 0 < πœ† < 1,
if the points π‘₯, 𝑦 from a geodesically convex manifold β„³
satisfy the conditions:
2 √
π‘Žπ‘›π‘‘
𝑑(π‘₯, 𝑦) ≀ π‘Ÿ0 24πœ† (6)
𝑑(π‘₯, 𝑦) < 𝑠0
πœ‹
then we have: 𝑑ℳ (π‘₯, 𝑦) β‰₯ 𝑑(π‘₯, 𝑦) β‰₯ (1 βˆ’ πœ†)𝑑ℳ (π‘₯, 𝑦)
Taking π‘₯ = 𝑒, 𝑦 = 𝑣 and πœ† = πœ†0 , it can be easily
verified that the conditions in equation 6 are satisfied for
our case. As 𝑑(𝑒, 𝑀) < 𝑑(𝑒, 𝑣), the conditions in 6 also
hold for π‘₯ = 𝑒, 𝑦 = 𝑀, πœ† = πœ†0 . Therefore we have:
𝑑(𝑒, 𝑀) β‰₯ (1 βˆ’ πœ†0 )𝑑ℳ (𝑒, 𝑀). Combining this result with
the previously known relation 𝑑(𝑒, 𝑣) > 𝑑(𝑒, 𝑀), we can
conclude that: 𝑑(𝑒, 𝑣) > 𝑑(𝑒, 𝑀) β‰₯ (1 βˆ’ πœ†0 )𝑑ℳ (𝑒, 𝑀). A
similar conclusion can be made taking π‘₯ = 𝑣, 𝑦 = 𝑀 and
πœ† = πœ†0 : 𝑑(𝑒, 𝑣) > 𝑑(𝑀, 𝑣) β‰₯ (1 βˆ’ πœ†0 )𝑑ℳ (𝑀, 𝑣). Summing
up these two relations we reach the following conclusion:
𝑑(𝑒, 𝑣) >
1 βˆ’ πœ†0
1
(1βˆ’πœ†0 )(𝑑ℳ (𝑀, 𝑒)+𝑑ℳ (𝑀, 𝑣)) β‰₯
𝑑ℳ (𝑒, 𝑣)
2
2
This contradicts the assumption that 𝑑(𝑒, 𝑣)
≀
0
𝑑ℳ (𝑒, 𝑣) 1βˆ’πœ†
2 . Therefore, the baseline algorithm will not
remove edge 𝑒 = (𝑒, 𝑣) and the proof ends here.
B. Shortcomings
The baseline algorithm has two drawbacks. First, if
𝑑ℳ (𝑒, 𝑣) is small (i.e. close to 2𝛿(β„³)), Theorem 1 can
not guaranty that edge (𝑒, 𝑣) is not removed, even when
𝑑(𝑒, 𝑣) ∼
= 𝑑ℳ (𝑒, 𝑣). If 𝑑ℳ (𝑒, 𝑣) < 2𝛿(β„³), in equation of
Theorem 1, the 𝛿(β„³) has greater influence than 𝑑ℳ (𝑒, 𝑣).
Consequently, the algorithm may remove wrong edges,
because now the precondition of the theorem is in risk of
being not true. An example of this situation occurs on a
plane-shaped manifold, where 𝑑(𝑒, 𝑣) is exactly equal to
𝑑ℳ (𝑒, 𝑣) for all 𝑒, 𝑣 ∈ 𝑉 (𝐺). Even though no shortcut
edge exists in this case, the baseline algorithm may remove
some of the edges in 𝐺.
Secondly, although the baseline algorithm is able to
pinpoint the large difference between 𝑑(𝑒, 𝑣) and 𝑑ℳ (𝑒, 𝑣)
for a shortcut edge, it naively removes the edges. The
classification result will improve if these edges have a very
small effect on inference instead of removing them. That is,
adjusting the edge weights in an appropriate manner, is a
better solution. This way, we can estimate the structure of
the manifold more accurately.
These shortcomings are overcome in the proposed algorithm, namely Isograph, which is described in the next
section.
C. An improved algorithm: Isograph
We now propose the Isograph algorithm to overcome
the shortcomings described in the previous section. This
algorithm is a modified version of the baseline algorithm
with two improvements:
βˆ™ To overcome the problem with small values of
𝑑ℳ (𝑒, 𝑣), Isograph leaves all edges with 𝑑(𝑒, 𝑣) ≀ πœ–
unchanged. If we choose πœ– such that πœ– > 2𝛿(β„³),
since we know 𝑑ℳ (𝑒, 𝑣) β‰₯ 𝑑(𝑒, 𝑣), then 𝑑ℳ (𝑒, 𝑣) >
2𝛿(β„³), which solves the first problem.
βˆ™ To overcome the second shortcoming, Isograph maintains an estimated value 𝑑ˆℳ (𝑒, 𝑣) for each edge
(𝑒, 𝑣) ∈ 𝐸(𝐺) and if 𝑑ˆℳ (𝑒, 𝑣) is too far from
𝑑ℳ (𝑒, 𝑣), instead of removing this shortcut edge, it
increases the edge weight, 𝑑ˆℳ (𝑒, 𝑣), to become a better
estimation for geodesic distance. Therefore, the same
graph structure is achieved with better edge weights
which might result in updating other edges in the
next iterations. In Theorem 3, we show that updating
in multiple iterations will increase the edge weights
and makes it more near to the geodesic distance and
therefore a more accurate estimation of the geodesic
distance is achieved.
As previously mentioned in Theorem 1, it can be proven
that for any edge (𝑒, 𝑣), which is detected as shortcut by the
baseline algorithm, we have:
𝑑ℳ (𝑒, 𝑣) β‰₯ 2(𝑑(𝑒, 𝑣) βˆ’ 𝛿(β„³))
Therefore, we may use the following update rule for edge
weights:
𝑑ˆℳ (𝑒, 𝑣) ← 2(𝑑(𝑒, 𝑣) βˆ’ 𝛿(β„³))
Later in Theorem 4, we will show that this is actually an
appropriate updating rule which gives a better estimation
of 𝑑ℳ (𝑒, 𝑣). Using the πœ– constraint, and updating the edge
weights iteratively, using the mentioned updating rule, we
come up with Isograph.
Require: An initial graph 𝐺 built with a graph construction
method (e.g. π‘˜-NN)
Ensure: Adjusted edge weights: 𝑑ˆ𝑑ℳ (𝑒, 𝑣), βˆ€(𝑒, 𝑣) ∈ 𝐸(𝐺)
1: for all 𝑒 = (𝑒, 𝑣) ∈ 𝐸(𝐺) do
(1)
2:
𝑑ˆℳ (𝑒, 𝑣) ← 𝑑(𝑒, 𝑣)
3: end for
4: for 𝑑 = 1 . . . π‘›π‘’π‘šπ‘π‘’π‘Ÿπ‘‚π‘“ πΌπ‘‘π‘’π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘›π‘  do
5:
for all 𝑒 = (𝑒, 𝑣) ∈ 𝐸(𝐺) βˆ– 𝐸(MST(𝐺)) do
(𝑑)
6:
if 𝑑ˆℳ (𝑒, 𝑣) β‰₯ πœ– then
7:
𝐺𝑒,𝑣 ← the subgraph of 𝐺 with edge weights
(𝑑)
less than 𝑑ˆℳ (𝑒, 𝑣)2
8:
𝑙 ← length of shortest path in 𝐺𝑒,𝑣 between 𝑒
and 𝑣
9:
if 𝑙 > 2 then
ˆ𝑑
10:
𝑑ˆ𝑑+1
β„³ (𝑒, 𝑣) ← 2(𝑑ℳ (𝑒, 𝑣) βˆ’ 𝛿(β„³))
11:
end if
12:
end if
13:
end for
14: end for
Algorithm 2: Isograph (The proposed algorithm)
In the following theorems, we will prove that the following loop invariant holds throughout the procedure of
Isograph:
𝑑(𝑒, 𝑣) ≀ 𝑑ˆ𝑑ℳ (𝑒, 𝑣) ≀ 𝑑ℳ (𝑒, 𝑣)
(7)
In addition, we show that the difference between the real
and the estimated values of 𝑑ℳ (𝑒, 𝑣) decreases by updating
edge weights in each iteration.
The following theorems show that the estimated value of
geodesic distance is always between the Euclidean distance
and the real geodesic distance. Therefore, we may increase
edge weights iteratively, without worrying about exceeding
the true distance.
βˆ€(𝑒, 𝑣) ∈ 𝐸(𝐺) :
Theorem 3. Assuming the loop invariant (equation 7) holds
at some time instance, if 𝑑ℳ (𝑒, 𝑣) < 2𝑑ˆℳ (𝑒, 𝑣) βˆ’ 2𝛿(β„³),
(𝑑)
2 In fact we also add any (π‘₯, 𝑦) ∈
/ 𝐸(𝐺) such that 𝑑(π‘₯, 𝑦) ≀ 𝑑ˆℳ (𝑒, 𝑣)
to 𝐺𝑒,𝑣
then Isograph will not update edge 𝑒 = (𝑒, 𝑣) (line 10 of
Algorithm 2).
Theorem 4. At any point throughout the procedure of
Isograph, equation 7 holds.
The proof of these theorems is included in Appendix A.
Lemma 1. If edge 𝑒 = (𝑒, 𝑣) is updated at iteration 𝑑 then
ˆ𝑑
𝑑ˆ𝑑+1
β„³ (𝑒, 𝑣) > 𝑑ℳ (𝑒, 𝑣)
(8)
Proof: Edge 𝑒 is updated so:
ˆ𝑑
𝑑ˆ𝑑+1
β„³ (𝑒, 𝑣) = 2(𝑑ℳ (𝑒, 𝑣) βˆ’ 𝛿(β„³))
We know that 𝑑ˆ𝑑ℳ (𝑒, 𝑣) β‰₯ πœ– > 2𝛿(β„³). Adding 𝑑ˆ𝑑ℳ (𝑒, 𝑣)
to both sides of this relation we have:
2𝑑ˆ𝑑ℳ (𝑒, 𝑣) > 2𝛿(β„³) + 𝑑ˆ𝑑ℳ (𝑒, 𝑣)
Therefore
β‡’
2(𝑑ˆ𝑑ℳ (𝑒, 𝑣) βˆ’ 𝛿(β„³)) > 𝑑ˆ𝑑ℳ (𝑒, 𝑣)
𝑑ˆ𝑑+1 (𝑒, 𝑣) > 𝑑ˆ𝑑 (𝑒, 𝑣)
β„³
β„³
D. Practical modifications
In Isograph, we have explicitly used 𝛿(β„³) and πœ–. From
a sampling of the manifold β„³, we can not exactly specify
its underlying geometry. There are many manifolds passing
through the same sample points, each having different 𝛿(β„³)
values, therefore computing 𝛿(β„³) is naturally an ill-posed
problem. However, Lemma 2 gives a lower bound for 𝛿(β„³).
Lemma 2. If the maximum edge in the minimum spanning
tree(MST) of the neighbourhood graph has weight π‘‘π‘šπ‘ π‘‘ , then
we have 𝛿(β„³) β‰₯ π‘‘π‘šπ‘ π‘‘
2 , where 𝛿(β„³) is defined in section V.
The proof of this lemma will come in Appendix A. As
(𝑑(𝑒))
Lemma 2 indicates, 𝛿(β„³) β‰₯ π‘šπ‘Žπ‘₯π‘’βˆˆMST(G)
. However
2
we argued that estimating 𝛿(β„³) is an ill-posed problem,
therefore in order to estimate 𝛿(β„³), we must suppose that
the data provided to our algorithms lies on a manifold with
some intuitively reasonable constraints. i.e we must assume
some prior knowledge about 𝛿(β„³).
Another issue is that in many cases the sampling might
be sparse in just a small region of β„³. As 𝛿(β„³) is defined
as a global parameter on the sampling, this results in a big
value of 𝛿(β„³), however the local value of 𝛿(β„³) may be
much smaller in many regions. In Theorem 3 where 𝛿(β„³)
entered our formulation, we do not need a global bound on
𝛿(β„³), so we can use a local bound instead. We assume that
the local value of 𝛿 has the same order of magnitude as 𝑑mst
in Lemma 2 i.e. 𝛿 = 𝛼 𝑑2mst .
Sparsity at some parts of the manifold can be more than
other parts, and an adaptive method of estimating 𝛿, will
clearly help Isograph. The 𝛼 𝑑2mst estimation of 𝛿 is not
adaptive, therefore to overcome this problem we will try
to solve it in an indirect manner. We already know that all
edges in the π‘˜π‘™ -NN graph, where π‘˜π‘™ β‰ͺ π‘˜ are rarely shortcut
edges. Therefore its reasonable if we do not modify these
edges at all. This showed to be effective in practice.
VII. E XPERIMENTAL R ESULTS
In this section we present the experimental results and
demonstrate the effectiveness of our neighbourhood graph
construction method when applied on the π‘˜-NN graph and
on the output of b-matching.
A. Synthetic Datasets
To evaluate our proposed method, we generated three synthetic datasets. We are able to evaluate the effectiveness of
our algorithm by illustrating the edges which are detected as
shortcuts. Each dataset lies on a different 2D manifold shape
embedded in a 3D space: Swiss roll, Step and Ellipsoid
(Figure 6). The data points are generated (200 points) by
a uniform i.i.d sampling on the manifold surface and each
point is translated by a independent random offset. The π‘˜NN method is used to construct the neighbourhood graph
where π‘˜ is selected as the smallest value for which a considerable number of shortcut edges emerge. The parameter
π‘˜π‘™ (section VI-D), is set to 5 for all graphs.
An effective shortcut edge detection algorithm eliminates
the edges connecting two irrelevant points (i.e. edges with
𝑑ℳ ≫ 𝑑), while maintaining edges lying on the manifold,
no matter how long the length of such edges may be. These
properties are pursuant to Theorem 2 and 3 respectively. In
these figures, it is easy to observe that our algorithm has both
of these properties, and therefore is effective. For a better
illustration, the graph edges of each manifold are partitioned
and shown in two separate figures: edges which are detected
as shortcuts, and those preserved.
B. Real World Experiments
In order to evaluate the proposed method, four standard
datasets which are consistent with the Manifold assumption
are selected which include MNIST, USPS, Caltech 101 and
Corel. USPS and MNIST are digit recognition datasets and
the others are image categorization datasets. For Caltech
and Corel datasets, a subset of classes was selected and
the CEDD feature set introduced by [16] was extracted; for
MNIST and USPS the image is the feature vector itself
(which is a low resolution image). Principal Component
Analysis (PCA) is applied on all datasets for noise removal.
For each dataset ten random sampling containing 2000
points of the whole data points were generated and Crossvalidation was used to partition the sampling into labeled
and unlabeled points, such that there were ten labeled
points for each class in average. The value of 𝛾, π‘˜π‘™ , 𝛼 and
π‘›π‘’π‘šπ‘π‘’π‘Ÿπ‘‚π‘“ πΌπ‘‘π‘’π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘›π‘  were selected as 0.02, 3, 0.5 and 3
respectively, for all experiments. In the first experiment, we
have applied Isograph on the 10-NN graph. To illustrate the
USPS
Accuracy(%)
Accuracy(%)
MNIST
82
79
5
10
K
83
81
15
5
Caltech
10
K
15
Corel
72
Accuracy(%)
90
Accuracy(%)
0βˆ’1 kNN
0βˆ’1 kNN+Isograph
kNN+ML
kNN+ML+Isograph
88
5
10
K
15
70
5
10
K
15
Figure 8. Charts comparing the accuracy of Isograph applied on the π‘˜-NN
with plain π‘˜-NN graph construction
Figure 6. Shortcut detection in π‘˜-NN graphs of three noisy synthetic
datasets: Ellipsoid (π‘˜ = 20), Step (π‘˜ = 22) and Swiss roll (π‘˜ = 13).
Figures on the right column illustrate the edges detected as shortcuts, and
therefore updated by Isograph algorithm, and edges on the left column are
those maintained.
Figure 7. Shortcut edges detected by Isograph and the path found by the
algorithm
effectiveness of Isograph in detecting the shortcut edges, we
have selected some of the updated edges and plotted the
path found by Isograph between the endpoints (Figure 7).
The first and the last pictures in each row represent the
endpoints of an edge. Although this edge was in the 10-NN,
the endpoints belong to two different classes. Therefore, our
algorithm improves the graph structure by updating the edge
between them.
In the second experiment we applied Isograph on the π‘˜NN graph and measured the accuracy of the classifier built
using the resulting neighbourhood graph. The results are
presented in Figure 8 for the mentioned datasets. We have
run our algorithm in two settings:
1) Binary: In this setting we only use unit weights for
edges. The π‘˜-NN approach to graph construction in
the binary setting is to connect each vertex to the π‘˜
nearest neighbours with unit weight. We call this the
β€œ0-1 π‘˜-NN graph” to discriminate it from the weighted
π‘˜-NN graph.
Isograph can be applied in binary graph construction
by running Isograph in the following way: We build
the weighted π‘˜-NN graph, run Isograph on this graph,
and remove any edges from the graph that are updated
by Isograph, after all iterations have finished. We name
the result as β€œ0-1 π‘˜-NN+Isograph”. Note that this is
different from using the baseline algorithm. Edges are
not removed in Isograph so they can influence on
estimating geodesic distance of other edges; Hence,
potentially more shortcut edges are updated.
2) Weighted: We compare Isograph with the β€œπ‘˜NN+ML” graph in this setting. The π‘˜-NN+ML graph
is constructed by creating the weighted π‘˜-NN graph
using the similarity Equation 2.
To build the β€œπ‘˜-NN+ML+Isograph” we applied Isograph on the weighted π‘˜-NN and used Equation 2.
In both weighted methods, Marginal Likelihood (ML)
was used to find the best 𝜎 for creating the similarity
matrix W.
All of four graph constructions above can be redone by
using any arbitrary graph construction method instead of
π‘˜-NN method. For instance, we combined Isograph with
𝑏-matching and showed that the classification accuracy is
superior to plain 𝑏-matching on every four datasets that
mentioned before (Figure 9).
USPS
Accuracy(%)
Accuracy(%)
MNIST
78
75
5
10
K
77
Figure 10. The shortest curve lying on β„³ connecting 𝑒 and 𝑣. π‘š is the
midpoint of the curve.
74
15
5
10
K
15
Corel
Caltech
71
Accuracy(%)
81
Accuracy(%)
0βˆ’1 bMatching
0βˆ’1 bMatching+Isograph
bMatching+ML
bMatching+ML+Isograph
77
5
10
K
15
70
5
10
K
15
Figure 9. 𝑏-matching is combined with Isograph in the 0-1 and ML setting
These figures show steady improvement of Isograph in
all the settings presented and the improvements are robust
to π‘˜. As we can see, in MNIST and USPS, π‘˜-NN+Isograph
considerably improved the results, This shows that Isograph
detects the shortcut edges perfectly. In these two datasets
π‘˜-NN+ML+Isograph works better for small values of π‘˜,
however performance slightly degrades for larger amounts
of k. This phenomenon can be explained by the fact that
for small values of π‘˜, the π‘˜-NN graph uses shorter edges,
that probably have smaller difference between their 𝑑 and
𝑑ℳ , therefore a maximum of three iterations is enough to
reach their correct weight. However when π‘˜ increases in
spite of the fact that we detect shortcut edges correctly, we
will not update the weight of edges in an appropriate number
of iterations, as the change in each iteration is limited to a
factor of nearly two.
On the Corel dataset, ML improves the results of 0-1 π‘˜NN and 0-1 π‘˜-NN+Isograph. In this setting, weights have
an important role in inferencing labels correctly. Therefore,
the difference between weighted and 0-1 is considerable. In
contrast on the Caltech, ML weighting has not a positive effect, therefore, we see the best results in 0-1 π‘˜-NN+Isograph.
This might be due to the possibility of non-equal number of
labeled data from each class, however note that Isograph has
still improved the 0-1 graph.
Furthermore, we combined Isograph with 𝑏-matching and
showed that the classification accuracy is superior to plain
𝑏-matching on all datasets with robustness w.r.t to the
parameter π‘˜ (Figure 9).
VIII. C ONCLUSIONS
In this paper, we showed that using geodesic distance
instead of Euclidean distance will improve the neighbourhood graph. Therefore, we proposed an unsupervised method
(Isograph) to estimate the geodesic distance between points.
We have provided bounds on the values of the geodesics
estimated by Isograph. As Isograph can be combined with
other graph construction methods, we combined it with
π‘˜-NN and 𝑏-matching and presented the results on realworld datasets, which show steady effectiveness of Isograph.
The effectiveness of using geodesic distance in the graph
construction procedure and convergance of the Isograph
algorithm are subject of future theoretical analysis. Better
local estimation of 𝛿 may lead to better geodesic distance
estimation. Furthermore labeled data may be employed to
improve the shortcut detection procedure.
IX. A PPENDIX A: P ROOF OF SOME OF THE THEOREMS
Theorem 3. Assuming the loop invariant (Equation 7) holds
at some time instance, if 𝑑ℳ (𝑒, 𝑣) < 2𝑑ˆℳ (𝑒, 𝑣) βˆ’ 2𝛿(β„³),
then Isograph will preserve edge 𝑒 = (𝑒, 𝑣).
Proof: Consider a shortest curve 𝐢 on β„³ starting from
u and ending at v (Figure 10). Let m be the midpoint of curve
𝐢, that is the point that halves the length of 𝐢. From the
sampling condition we know that there exists a point 𝑀 in
the sampling such that 𝑑ℳ (π‘š, 𝑀) ≀ 𝛿(β„³).
We first want to show that 𝑀 can not coincide any of 𝑒 or
𝑣. From the loop invariant assumption we have: 𝑑ˆℳ (𝑒, 𝑣) ≀
𝑑ℳ (𝑒, 𝑣). As we had assumed 𝑑ℳ (𝑒,𝑣)
+𝛿(β„³) < 𝑑ˆℳ (𝑒, 𝑣),
2
we get
𝑑ℳ (𝑒, 𝑣)
= 𝑑ℳ (𝑒, π‘š) = 𝑑ℳ (π‘š, 𝑣)
2
This means that 𝑀 can not be any of 𝑒 or 𝑣 because
𝑑ℳ (π‘š, 𝑀) ≀ 𝛿(β„³).
Now by the triangle inequality we have:
𝛿(β„³) <
𝑑ℳ (𝑒, 𝑀) ≀ 𝑑ℳ (𝑒, π‘š) + 𝑑ℳ (π‘š, 𝑀)
By adding 𝑑ℳ (𝑒, π‘š) =
we get
𝑑ℳ (𝑒,𝑣)
2
𝑑ℳ (𝑒, π‘š) + 𝑑ℳ (π‘š, 𝑀) ≀
and 𝑑ℳ (π‘š, 𝑀) ≀ 𝛿(β„³)
𝑑ℳ (𝑒, 𝑣)
+ 𝛿(β„³)
2
So we have
𝑑ℳ (𝑒, 𝑀) ≀
𝑑ℳ (𝑒, 𝑣)
+ 𝛿(β„³)
2
Finally plugging the last inequality in the assumption that
𝑑ℳ (𝑒,𝑣)
+ 𝛿(β„³) < 𝑑ˆℳ (𝑒, 𝑣) we reach
2
𝑑ˆℳ (𝑒, 𝑀) ≀ 𝑑ℳ (𝑒, 𝑀) < 𝑑ˆℳ (𝑒, 𝑣)
In a similar way we have 𝑑ˆℳ (𝑣, 𝑀) < 𝑑ˆℳ (𝑒, 𝑣). Therefore,
edges (𝑒, 𝑀) and (𝑣, 𝑀) are both in 𝐸(𝐺𝑒,𝑣 ) and Isograph
will preserve edge (𝑒, 𝑣) due to point 𝑀.
Theorem 4. At any point throughout the procedure of
Isograph, Equation 7 holds.
Proof: We show that Equation 7 is a loop invariant, that
is we must show that:
1) Equation 7 is true when initializing 𝑑ˆℳ at the beginning of the algorithm.
2) With the assumption that Equation 7 holds at some
time instance , it still holds after updating an edge.
(1)
Item one is true because 𝑑ˆℳ = 𝑑(𝑒, 𝑣), and by Proposition 1
we have 𝑑(𝑒, 𝑣) ≀ 𝑑ℳ (𝑒, 𝑣).
We now prove item two. Suppose at some time 𝑑 the loop
invariant holds.
We should show that 𝑑ˆ𝑑+1
β„³ (𝑒, 𝑣) ≀ 𝑑ℳ (𝑒, 𝑣). According to theorem 3, if an edge is updated we must have:
+ 𝛿(β„³), so
𝑑ˆ𝑑ℳ (𝑒, 𝑣) ≀ 𝑑ℳ (𝑒,𝑣)
2
ˆ𝑑
𝑑ˆ𝑑+1
β„³ (𝑒, 𝑣) = 2(𝑑ℳ (𝑒, 𝑣) βˆ’ 𝛿(β„³)) ≀ 𝑑ℳ (𝑒, 𝑣)
Lemma 2. If the maximum edge in the minimum spanning
tree (MST) of the neighbourhood graph has weight 𝑑mst , we
have 𝛿(β„³) β‰₯ 𝑑2mst , where 𝛿(β„³) is defined in section V.
Proof: Let (𝑒, 𝑣) be the edge with maximum weight
in the MST. Suppose that removing edge (𝑒, 𝑣) results in
two connected components 𝐢1 and 𝐢2 . Define 𝑑ℳ (π‘₯, 𝐢1 )
to be the minimum distance of point π‘₯ to points in 𝐢1 i.e.
𝑑ℳ (π‘₯, 𝐢1 ) = π‘šπ‘–π‘›π‘¦βˆˆπΆ1 𝑑ℳ (π‘₯, 𝑦). 𝑑ℳ (π‘₯, 𝐢2 ) is defined in
a similar way.
Now, let 𝐢 be any curve between 𝑒 and 𝑣. For any point π‘₯
on this curve we compute 𝑓 (π‘₯) = 𝑑ℳ (π‘₯, 𝐢1 ) βˆ’ 𝑑ℳ (π‘₯, 𝐢2 ).
We know 𝑓 (𝑒) < 0 and 𝑓 (𝑣) > 0 and f is continuous, so
by the intermediate value theorem, there exists a point π‘₯βˆ—
on curve 𝐢 such that 𝑓 (π‘₯βˆ— ) = 0.
Suppose that π‘₯1 be the point from 𝐢1 that has minimum
distance from π‘₯βˆ— and π‘₯2 is defined in a similar way. By
definition 𝛿(β„³) β‰₯ 𝑑ℳ (π‘₯βˆ— , π‘₯1 ) = 𝑑ℳ (π‘₯βˆ— , π‘₯2 ), so we have
2𝛿(β„³)
β‰₯ 𝑑ℳ (π‘₯βˆ— , π‘₯1 ) + 𝑑ℳ (π‘₯βˆ— , π‘₯2 ) β‰₯ 𝑑ℳ (π‘₯1 , π‘₯2 )
β‰₯ 𝑑ˆℳ (π‘₯1 , π‘₯2 )
As (𝑒, 𝑣) is an edge in MST, 𝑑ˆℳ (𝑒, 𝑣) is the shortest edge
between any vertex in 𝐢1 to some arbitrary vertex in 𝐢2 .
Therefore
2𝛿(β„³) β‰₯ 𝑑ˆℳ (π‘₯1 , π‘₯2 ) β‰₯ 𝑑ˆℳ (𝑒, 𝑣) = π‘‘π‘šπ‘ π‘‘
R EFERENCES
[1] R. Ando and T. Zhang, β€œA high-performance semi-supervised
learning method for text chunking,” in Proceedings of the
43rd Annual Meeting on Association for Computational Linguistics, pp. 1–9, 2005.
[2] S. Basu, M. Bilenko, and R. Mooney, β€œA probabilistic framework for semi-supervised clustering,” in Proceedings of the
tenth ACM international conference on Knowledge discovery
and data mining, pp. 59–68, 2004.
[3] S. Hoi and M. Lyu, β€œA semi-supervised active learning
framework for image retrieval,” in IEEE Computer Society
Conference on Computer Vision and Pattern Recognition,
vol. 2, pp. 302–309, 2005.
[4] O. Chapelle, B. Schölkopf, and A. Zien, Semi-supervised
learning, vol. 2. MIT press Cambridge, MA, 2006.
[5] M. Belkin and P. Niyogi, β€œSemi-supervised learning on
riemannian manifolds,” Machine Learning, vol. 56, no. 1,
pp. 209–239, 2004.
[6] X. Zhu, J. Lafferty, and R. Rosenfeld, Semi-supervised learning with graphs. PhD thesis, 2005.
[7] T. Jebara, J. Wang, and S. Chang, β€œGraph construction and bmatching for semi-supervised learning,” in Proceedings of the
26th Annual International Conference on Machine Learning,
pp. 441–448, ACM, 2009.
[8] B. Huang and T. Jebara, β€œLoopy belief propagation for
bipartite maximum weight b-matching,” Artificial Intelligence
and Statistics, 2007.
[9] W. Cukierski and D. Foran, β€œUsing betweenness centrality to
identify manifold shortcuts,” in IEEE International Conference on Data Mining Workshops, pp. 949–958, 2008.
[10] J. Tenenbaum, V. Silva, and J. Langford, β€œA global geometric
framework for nonlinear dimensionality reduction,” Science,
vol. 290, no. 5500, p. 2319, 2000.
[11] M. Belkin and P. Niyogi, β€œProblems of learning on manifolds,” The University of Chicago, 2003.
[12] M. Hein and U. von Luxburg, β€œIntroduction to graph-based
semi-supervised learning,”
[13] J. Odegard, β€œDimensionality reduction methods for molecular
motion,”
[14] M. Bernstein, V. De Silva, J. Langford, and J. Tenenbaum,
β€œGraph approximations to geodesics on embedded manifolds,” tech. rep., Technical report, Department of Psychology,
Stanford University, 2000.
[15] M. Do Carmo, Riemannian geometry. Birkhauser, 1992.
[16] Y. Chen and J. Wang, β€œImage categorization by learning and
reasoning with regions,” The Journal of Machine Learning
Research, vol. 5, pp. 913–939, 2004.