Download Isograph: Neighbourhood Graph Construction Based On Geodesic Distance For Semi-Supervised Learning

Isograph: Neighbourhood Graph Construction Based on Geodesic Distance for Semi-Supervised Learning Marjan Ghazvininejad, Mostafa Mahdieh, Hamid R. Rabiee, Parisa Khanipour Roshan and Mohammad Hossein Rohban DML Research Lab, Department of Computer Engineering, Sharif University of Technology Tehran, Iran Contact Email: [email protected] Abstract—Semi-supervised learning based on manifolds has been the focus of extensive research in recent years. Convenient neighbourhood graph construction is a key component of a successful semi-supervised classification method. Previous graph construction methods fail when there are pairs of data points that have small Euclidean distance, but are far apart over the manifold. To overcome this problem, we start with an arbitrary neighbourhood graph and iteratively update the edge weights by using the estimates of the geodesic distances between points. Moreover, we provide theoretical bounds on the values of estimated geodesic distances. Experimental results on real-world data show significant improvement compared to the previous graph construction methods. Keywords- Semi-supervised Learning, Manifold, Geodesic distance, Graph Construction I. I NTRODUCTION A. Semi-supervised Learning The costly and time consuming process of data labeling, and the amount of relatively cheap unlabeled data at hand, are two elements of real-world applications that have caused a recent interest in applying Semi-Supervised learning (SSL) methods. Text and image mining are common examples of applications in which SSL plays an important role ([1], [2], [3]). SSL methods utilize both labeled and unlabeled data to improve the generalization ability of the learner in such applications. Using the unlabeled data, one can calculate the distribution of data in feature space, which can extremely improve the classification. In order to use the unlabeled data for label inference more efficiently, certain assumptions should be made about the general geometric properties of the data. In many applications, high dimensional data points are actually samples from a low-dimensional subspace of the actual feature space. In these cases, we can make use of the Manifold/Cluster assumption which is among the most practical assumptions in SSL [4]. Manifold/Cluster assumption is held in many real world datasets in general, and image datasets in particular [5]. Weighted discrete graphs are a suitable representation of manifolds. A manifold can be represented by sampling a finite number of points from the manifold as the graph vertices and putting edges between nearby points on the manifold. As the underlying manifold is unknown in real data and smoothness estimation of the labeling function heavily relies on the manifold model, graph construction plays an important role in this problem. Therefore, several methods have been proposed to construct a suitable graph representing the manifold. These graph construction methods, output a weighted graph in which each data point is vertex and edge weights illustrate the amount of distance between ending points. The constructed graph is then used to infer labels of unlabeled data points. Therefore, appropriate construction of the neighbourhood graph plays a key role in manifold based SSL. This argument is further discussed in Subsection II-C. Some recent work in Semi-Supervised Learning literature has focused on proposing graph construction methods that best represent the manifold structure. 𝑘-NN and 𝜖-ball are two classical methods of graph construction [6]. Several schemes have been proposed to improve the 𝑘-NN graph construction method. Moreover, Jebara et al. proposed the 𝑏-matching algorithm [7] which unlike the 𝑘-NN method, produces a balanced graph, i.e. all nodes have the same number of neighbours. The effectiveness of this method has been corroborated by both theoretical and experimental justifications [8]. However, quite a lot of graph construction methods have utilized the Euclidean distance of data points as a measure for evaluating distance. Unfortunately, this approach is misleading at times, since two points having a small Euclidean distance may be situated far apart on the manifold. In this case two points are connected while in fact they are distant from each other on the manifold. Such edges are called shortcut edges [9]. In this case the graph does not represent the manifold structure correctly. This situation can be prevented using a distance measure that reflects the distance of data points more efficiently. Approaching the correct distance of data points enables us to determine the neighbourhood of a data point more precisely. Cukierski et. al. have proposed a method to identify shortcut edges via the Betweenness Centrality measure[9]. Betweenness Centrality is related to the number of graph shortest paths (between any two vertices) that pass through a specific edge. They intuitively argue that the shortcut edges are probable to have high Betweenness Centrality, and used this fact to remove these edges. However, to the best of our knowledge, this argument is not justified theoretically. In this paper, we introduce a novel algorithm to determine shortcut edges and adjust their weights with the aim of approaching the real distance of the points on the manifold. The graph constructed by this algorithm is based on the intrinsic distance between points, hence we named it Isograph. We provide a solid theoretical foundation for our work, and actually our algorithm comes out from the theorems. Experiments on benchmark datasets show promising results compared to the previous state-of-the-art work. It is noticeable that our method can be initialized from any arbitrary graph, therefore it can be easily combined with many previous graph construction methods. Finally,f note that the famous Isomap, tries to estimate the geodesic distance between the points and the result of these methods are considerable [10]. However Isomap can not find the geodesic distance between near points properly. It concentrates on finding geodesic distance between far points, therefore can not used in graph construction methods. The remaining of the paper is organized as follows. In section II we introduce the notations used throughout the paper and provide basic definitions in this field. This section can be safely skipped by the experienced reader. Next in SectionIII and IV we explain the motivations of our algorithm and the basic idea of geodesic distance. This is followed by the precise problem setting in Section V. In section VI we present our proposed method and give theoretical justifications for it. Finally in section VII, the experimental results of applying our method on both synthetic and real world datasets are presented. Figure 1. Four of five given data points are labeled. The goal is to predict the label of the unlabeled data point A) Without any prior knowledge, B) Knowing the Manifold/Cluster assumption [4]. prediction is class -, as the two points near the unknown point are from this class. Now suppose we somehow know that the data points are only distributed on the curve shown in Figure 1 B and expect the labels of adjacent points on this curve to be similar. In this new setting the label + is a better candidate, as the two adjacent points of the unlabeled point are both from class +. The assumption just mentioned is called Manifold/Cluster assumption and is generalized to 𝑚-dimensional spaces. The Manifold/Cluster assumption in fact consists of two parts: The Manifold assumption and the Cluster assumption. The Manifold assumption states that data points lie on a 𝑑-dimensional manifold (denoted by ℳ) in the 𝑚-dimensional feature space (𝑑 ≪ 𝑚). The Cluster assumption states that labels of the points vary smoothly on ℳ. We will use the term “Manifold assumption” instead of Manifold/Cluster assumption in the rest of this paper. Suppose 𝑓 is a labeling function on ℳ .i.e a function from ℳ to ℝ. The smoothness of 𝑓 is formally defined as [11]: ∫ ∥▽𝑓 ∥2 𝑑𝑉ℳ (1) 𝑆(𝑓 ) = ℳ II. BASICS AND N OTATIONS A. Classification Setting Consider the set of possible classes {−1, +1}, and let the feature space be the 𝑚-dimensional real-valued space: ℝ𝑚 . We denote the labeled data as 𝑋𝐿 = {x1 , . . . , x𝑙 } with corresponding labels y = (𝑦1 , . . . , 𝑦𝑙 ), where x𝑖 ∈ ℝ𝑚 and 𝑦𝑖 ∈ {−1, +1}. The unlabeled dataset is given as 𝑋𝑈 = {x𝑙+1 , . . . , x𝑙+𝑢 } where x𝑖 ∈ ℝ𝑚 . In this setting 𝑋𝐿 , 𝑋𝑈 and y are given and the goal is to find an estimation of the labels of the data f = (𝑓1 , . . . , 𝑓𝑙+𝑢 ) for all the points. The value of 𝑓𝑖 is a real value between -1 and 1 and bigger values of 𝑓𝑖 correspond to more membership in class +1. Practically these values are mapped to {−1, +1} after inference is done. This classification problem can be generalized to the case of possible classes {1, 2, . . . , 𝑐} where 𝑐 ≥ 2 using the oneagainst-all method. B. Manifold/Cluster assumption In Figure 1 A the goal is to predict the label of the unlabeled point. Without any other prior knowledge the best Actually 𝑆(𝑓 ) captures the concept of roughness instead of smoothness, but we will name it smoothness as previous authors have done. The Manifold assumption also states that 𝑆(𝑓 ) must have a small value. C. Neighbourhood graph In SSL algorithms, we should use a discrete representation of the manifold, as we only have a finite number of data points. Therefore graphs are suitable representation for manifolds. To build a graph from the data points given, we consider one vertex for each data point and add edges between data points that are adjacent on the manifold, hence we call it neighbourhood graph. If the underlying manifold is known, constructing such a graph is straightforward. The challenging problem occurs when we do not have the manifold, which is the case in real world problems. This problem is called “graph construction”. An example of a complex manifold together with the neighbourhood graph constructed on the manifold is shown in Figure 2. construction, when tested on the digit recognition task and text classification. In the experimental results section, we will show that Isograph can improve the graph generated by 𝑏-matching. D. Label Inference Figure 2. A curved 2D manifold in the 3D feature space. The data points are shown as black dots, and the neighbourhood graph edges are shown as lines connecting data points [12]. We denote the neighbourhood graph constructed by 𝐺 = (𝑉, 𝐸), where 𝑉 = 𝑉 (𝐺) is the set of vertices of the graph and 𝐸 = 𝐸(𝐺) is the set of edges of the graph. Each edge on the graph represents a neighbourhood relationship and the edge weights are the distances of the corresponding endpoints. The weight of edge 𝑒 = (𝑢, 𝑣) is denoted by 𝑑(𝑒) or 𝑑(𝑢, 𝑣) throughout the paper. For simplicity we assume 𝑑(𝑢, 𝑣) = ∞ if no edge exists between 𝑢 and 𝑣 in 𝐺. We choose the neighbours of each vertex and the weights of edges such that should approximate the manifold structure. Several methods have been proposed for graph construction, which have tried to present appropriate approximations of the manifold structure. We introduce some of these methods in the following. 1) Classical graph construction methods: 𝑘-NN and 𝜖ball are two classical methods of graph construction [6]. In the 𝑘-NN graph construction method, each data point in the graph is connected to the 𝑘 nearest neighbours of the data points, and the weights are the euclidean distance between the endpoints. As we always put the reverse edges in the graph to make the graph symmetric, The degree of some of the vertices might get much greater than 𝑘. In the 𝜖-ball method, each data point is connected to data points which the distance between them are less than 𝜖 and the weight of that edge is equal to the distance between endpoints. There is no constraint for the degree of vertices in this method. If 𝜖 is too small the resulting graph will be too sparse and a big 𝜖 will result too many non-relevant edges, therefore finding a suitable 𝜖 is a hard task. Hence 𝑘-NN is used more frequent in practice. 2) 𝑏-matching: 𝑏-matching is a well-known state-of-theart graph construction method that has experienced active research in the recent years [8], [7]. 𝑏-matching creates a balanced graph which has equal degree 𝑘 for all vertices. This method works well when samples are distributed nonuniformly in the feature space. Theoretical foundations for this method have been presented. This method is reported to improve the 𝑘-NN graph We have used distances as edge weights but another near concept, namely similarity, is needed for semi-supervised label inference. Similarity is the converse of distance; when distance between two data points is low there similarity is high and vice versa. Similarity can be derived from distance in few ways, among them is Gaussian similarity. Let 𝑊 be the similarity matrix corresponding to graph 𝐺, that is 𝑊𝑖𝑗 is the similarity between vertices 𝑖 and 𝑗. Then Gaussian similarity is defined as 𝑑(𝑖, 𝑗)2 ) 𝜎2 We mentioned that smoothness is defined as: ∫ ∥▽𝑓 ∥2 𝑑𝑉ℳ 𝑆(𝑓 ) = 𝑊𝑖𝑗 = exp(− (2) ℳ As we just have finite number of points on the manifold and need to infer 𝑓 just on these points, we can approximate smoothness restricted to these points as [11]: ˆ )= 𝑆(f 𝑙+𝑢 ∑ W𝑖𝑗 (𝑓𝑖 − 𝑓𝑗 )2 (3) 𝑖,𝑗=1 The label inference process is based on finding an 𝑓 which minimizes a mixture of both 𝑆(𝑓 ) and the error of 𝑓 on the labeled data. ˆ ) can be written in the following It is easy to show that 𝑆(f quadratic form: ˆ ) = f ⊤ Lf 𝑆(f (4) with L = D∑− W, where D is the diagonal degree matrix 𝑙+𝑢 (i.e. D𝑖𝑖 = 𝑗=1 W𝑖𝑗 ). L is known as the graph Laplacian. The inference minimization problem is formally defined in the following form. f ∗ = min ∥Cf − y∥2 + 𝛾f ⊤ Lf (5) f ) ( C = I𝑙×𝑙 0𝑙×𝑢 is a selection matrix, .i.e Cf only has the labeled indexes of f , therefore ∥Cf − y∥2 represents the difference between y and f . An algorithm of running time 𝑂(𝑛3 ) can compute the solution to this equation, where 𝑛 is the number of data points. III. M OTIVATION As previously mentioned, shortcut edges connect those points of the graph which are close to each other according to the Euclidean distance, but have large geodesic distance on the manifold. An example of such edges and the underlying manifold is shown in Figure 3. These edges may be disastrous to the label inference process. According to u v Figure 3. Part of a one-dimensional manifold showing the shortcut edge between 𝑢 and 𝑣 is a popular assumption in Semi-Supervised Learning and the sampling condition is a reasonable condition which is common in the manifold learning literature [14]. In our problem setting, we have the following assumptions: 1) The data points lie on a 𝑑-dimensional manifold, denoted by ℳ. 2) Sampling condition: The manifold ℳ is sampled as follows: There exists 𝛿 ∈ ℝ such that for any point 𝑝 ∈ ℳ, there exists a data point 𝑞 in the labeled or unlabeled data points, such that 𝑑ℳ (𝑝, 𝑞) ≤ 𝛿. We refer to the least such 𝛿 as 𝛿(ℳ). 1 VI. P ROPOSED M ETHOD Figure 4. Geodesic curve between two points on a manifold [13] the Manifold assumption, we expect close data points on the manifold to have similar labels. This condition may be violated in the case of shortcut edges, since the adjacent data points are actually far from each other on the manifold. Therefore, it is crucial to find such edges and reduce their impact on the inference process. We expect a graph which has fewer shortcut edges to perform better in classification, therefore shortcut edge detection is a key problem in neighbourhood graph construction. In fact this paper aims at detecting such edges and removing them or adjusting their weights in an appropriate manner. In this section, we want to pass down the intuition of the proposed algorithm with a basic algorithm. Then we add more details and practical modification to this and introduce our final algorithm. One major improvement of the final algorithm over the baseline is that it adjusts the weights of shortcut edges, instead of naively removing all edges suspicious to being shortcut. A. The baseline algorithm V. P ROBLEM S ETTING In the baseline algorithm, we mainly try to detect the shortcut edges. An edge (𝑢, 𝑣) from the neighbourhood graph is a shortcut edge if and only if 𝑑(𝑢, 𝑣) ≪ 𝑑ℳ (𝑢, 𝑣). Looking back at Figure 3, we can observe that an important feature of edge (𝑢, 𝑣) -which is a shortcut edge- is that when we remove this edge, the shortest path between 𝑢 and 𝑣 which only contains small edges, must nearly pass through the curved manifold, therefore in fact this path has a lot of edges. This is the key intuition to our algorithm, which we explain more precisely in the following. Suppose we start with an initial graph 𝐺, achieved by any graph construction method. For any edge 𝑒 = (𝑢, 𝑣) ∈ 𝐸(𝐺) with weight 𝑑, we consider the subgraph containing edges with weights less than 𝑑 (the small edges previously mentioned). Assume that the shortest path between 𝑢 and 𝑣 be a long path in this subgraph. All edges on this path have smaller weights than the edge 𝑒, therefore we expect this path to better represent the geodesic curve between 𝑢 and 𝑣 compared to edge 𝑒. As a result, the estimation of geodesic distance may be better achieved using this path. If the number of edges in such a path is big enough, e.g. bigger than two, it is probable that (𝑢, 𝑣) is connecting points that may be far on the manifold, and therefore (𝑢, 𝑣) is probably a shortcut edge. In the following, we prove that the threshold length of two is an appropriate measure for detecting shortcuts. This procedure does not perform for the edges of Minimum Spanning Tree (MST) of the initial graph of 𝐺 In this section, we introduce the assumptions which we have based our algorithms on. These contain the Manifold assumption and a sampling condition. Manifold assumption 1 ℳ is assumed to be bounded. This is reasonable because usally in a machine representation the feature space is finite and ℳ is a subset of the feature space. IV. G EODESIC D ISTANCE In the plane, the shortest path between two points is the straight line connecting them, but in general manifolds such as sphere, this line does not lie on the manifold. Therefore, we need a new concept to define distance between points on manifolds. Geodesic curves are curves lying on the manifold connecting points with the shortest path (Figure 4). Definition 1. For any two points 𝑝 and 𝑞 on the manifold ℳ, we define 𝑑ℳ (𝑝, 𝑞) as the length of the shortest curve between 𝑝 and 𝑞 lying on ℳ. Proposition 1. For any 𝑝, 𝑞 ∈ ℳ: 𝑑(𝑝, 𝑞) ≤ 𝑑ℳ (𝑝, 𝑞), where 𝑑(𝑢, 𝑣) is the metric in the ambient space This is intuitively clear, but can be proven rigorously using straight line segments for length estimation. (MST(𝐺)), because we claim that preserving edges MST(𝐺) is necessary for graph construction in proposed algorithm. A disconnected graph is disastrous to the process of label inference. To ensure the connectivity of 𝐺 we do not remove any of the edges in 𝑀 𝑆𝑇 (𝐺). 𝑀 𝑆𝑇 (𝐺) is chosen to prefer smaller edges as they are less probable to be shortcut edges. Require: An initial graph 𝐺 built with a graph construction method (e.g. 𝑘-NN) Ensure: Shortcuts of graph 𝐺 are removed 1: Let 𝐺𝑓 be the full graph on the sampling, i.e. the graph which contains edges 𝑒 = (𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉 (𝐺) 2: for all 𝑒 = (𝑢, 𝑣) ∈ 𝐸(𝐺) ∖ 𝐸(𝑀 𝑆𝑇 (𝐺)) in ascending order of distance do 3: 𝐺𝑢,𝑣 ← the subgraph of 𝐺𝑓 with edge weights less than 𝑑(𝑢, 𝑣) 4: 𝑙 ← length of shortest path in 𝐺𝑢,𝑣 between 𝑢 and 𝑣 5: if 𝑙 > 2 then 6: Remove edge 𝑒 from 𝐸(𝐺) 7: end if 8: end for Algorithm 1: The baseline algorithm To justify the correctness of our algorithm, we should show that the baseline algorithm preserves an edge (𝑢, 𝑣) ∈ 𝐸(𝐺) if 𝑑(𝑢, 𝑣) is close enough to 𝑑ℳ (𝑢, 𝑣), and removes it otherwise. We already know from Proposition 1 that 𝑑(𝑢, 𝑣) ≤ 𝑑ℳ (𝑢, 𝑣) is always held. In the following theorems, we first justify that if 𝑑ℳ (𝑢, 𝑣) is not too larger than 𝑑(𝑢, 𝑣), the edge is not removed by the baseline algorithm. Theorem 1. If 𝑑ℳ (𝑢, 𝑣) < 2𝑑(𝑢, 𝑣) − 2𝛿(ℳ), where 𝛿(ℳ) is defined in Definition 1, then the baseline algorithm will preserve edge (𝑢, 𝑣). Proof: This theorem is a special case of Theorem 3, which will be proved in Appendix A. In order to complete the justifications, we further show that if 𝑑ℳ (𝑢, 𝑣) is much larger than 𝑑(𝑢, 𝑣), edge (𝑢, 𝑣) will be removed by the baseline algorithm. To do so, we need to define some concepts first. Definition 2. 1) Consider all unit-speed geodesic curves 𝐶 completely lying on ℳ. The minimum radius of curvature 𝑟0 = 𝑟0 (ℳ) is defined by 1 ¨ ∥} = max{∥ 𝐶(𝑡) 𝐶,𝑡 𝑟0 ¨ represents the second derivation of 𝐶 with where 𝐶(𝑡) respect to 𝑡 [14]. 2) The minimum branch separation 𝑠0 = 𝑠0 (ℳ) is defined as the largest positive number for which, u v w Figure 5. The geodesic path between two endpoints of edge 𝑒 = (𝑢, 𝑣) is showed by the dashed line. The geodesic paths between pairs 𝑢, 𝑤 and 𝑤, 𝑣 are shown by solid curves. 𝑑(𝑥, 𝑦) < 𝑠0 implies that 𝑑ℳ (𝑥, 𝑦) ≤ 𝜋𝑟0 , for every 𝑥, 𝑦 ∈ ℳ, where 𝑟0 is the minimum radius of curvature [14]. Definition 3. Manifold ℳ is called geodesically convex if there exists a Mathematically geodesic curve 𝐶 between any two arbitrary points 𝑥, 𝑦 ∈ ℳ with the length 𝑑ℳ (𝑥, 𝑦) [14]. A Mathematically geodesic curve 𝐶 on manifold ℳ is a curve where the geodesic curvature is zero on all points of the curve [15]. This condition is just needed for next theorem. Theorem 2. If ℳ is a geodesically convex manifold and there exist 𝑢, 𝑣 ∈ ℳ where 𝑑(𝑢, 𝑣) < 𝑠0 and 𝑑ℳ (𝑢, 𝑣) ≥ 2 1−𝜆0 𝑑(𝑢, 𝑣), then the baseline algorithm removes edge 𝑒 = (𝑢, 𝑣), where 𝜆0 is a constant for a given manifold ℳ and 2 (ℳ)2 can be computed by 𝜆0 = 𝜋96𝑟𝑠00(ℳ) 2 . Proof: Suppose the baseline algorithm does not remove edge 𝑒. Therefore, according to the baseline algorithm, the length of the shortest path between 𝑢 and 𝑣 in 𝐺𝑢,𝑣 equals to two (It can not be one because we omit (𝑢, 𝑣)) and hence, there exist edges 𝑒1 = (𝑢, 𝑤) and 𝑒2 = (𝑤, 𝑣) such that 𝑑(𝑢, 𝑤) < 𝑑(𝑢, 𝑣) and 𝑑(𝑤, 𝑣) < 𝑑(𝑢, 𝑣) (Figure 5). From [14], we know that for any arbitrary 0 < 𝜆 < 1, if the points 𝑥, 𝑦 from a geodesically convex manifold ℳ satisfy the conditions: 2 √ 𝑎𝑛𝑑 𝑑(𝑥, 𝑦) ≤ 𝑟0 24𝜆 (6) 𝑑(𝑥, 𝑦) < 𝑠0 𝜋 then we have: 𝑑ℳ (𝑥, 𝑦) ≥ 𝑑(𝑥, 𝑦) ≥ (1 − 𝜆)𝑑ℳ (𝑥, 𝑦) Taking 𝑥 = 𝑢, 𝑦 = 𝑣 and 𝜆 = 𝜆0 , it can be easily verified that the conditions in equation 6 are satisfied for our case. As 𝑑(𝑢, 𝑤) < 𝑑(𝑢, 𝑣), the conditions in 6 also hold for 𝑥 = 𝑢, 𝑦 = 𝑤, 𝜆 = 𝜆0 . Therefore we have: 𝑑(𝑢, 𝑤) ≥ (1 − 𝜆0 )𝑑ℳ (𝑢, 𝑤). Combining this result with the previously known relation 𝑑(𝑢, 𝑣) > 𝑑(𝑢, 𝑤), we can conclude that: 𝑑(𝑢, 𝑣) > 𝑑(𝑢, 𝑤) ≥ (1 − 𝜆0 )𝑑ℳ (𝑢, 𝑤). A similar conclusion can be made taking 𝑥 = 𝑣, 𝑦 = 𝑤 and 𝜆 = 𝜆0 : 𝑑(𝑢, 𝑣) > 𝑑(𝑤, 𝑣) ≥ (1 − 𝜆0 )𝑑ℳ (𝑤, 𝑣). Summing up these two relations we reach the following conclusion: 𝑑(𝑢, 𝑣) > 1 − 𝜆0 1 (1−𝜆0 )(𝑑ℳ (𝑤, 𝑢)+𝑑ℳ (𝑤, 𝑣)) ≥ 𝑑ℳ (𝑢, 𝑣) 2 2 This contradicts the assumption that 𝑑(𝑢, 𝑣) ≤ 0 𝑑ℳ (𝑢, 𝑣) 1−𝜆 2 . Therefore, the baseline algorithm will not remove edge 𝑒 = (𝑢, 𝑣) and the proof ends here. B. Shortcomings The baseline algorithm has two drawbacks. First, if 𝑑ℳ (𝑢, 𝑣) is small (i.e. close to 2𝛿(ℳ)), Theorem 1 can not guaranty that edge (𝑢, 𝑣) is not removed, even when 𝑑(𝑢, 𝑣) ∼ = 𝑑ℳ (𝑢, 𝑣). If 𝑑ℳ (𝑢, 𝑣) < 2𝛿(ℳ), in equation of Theorem 1, the 𝛿(ℳ) has greater influence than 𝑑ℳ (𝑢, 𝑣). Consequently, the algorithm may remove wrong edges, because now the precondition of the theorem is in risk of being not true. An example of this situation occurs on a plane-shaped manifold, where 𝑑(𝑢, 𝑣) is exactly equal to 𝑑ℳ (𝑢, 𝑣) for all 𝑢, 𝑣 ∈ 𝑉 (𝐺). Even though no shortcut edge exists in this case, the baseline algorithm may remove some of the edges in 𝐺. Secondly, although the baseline algorithm is able to pinpoint the large difference between 𝑑(𝑢, 𝑣) and 𝑑ℳ (𝑢, 𝑣) for a shortcut edge, it naively removes the edges. The classification result will improve if these edges have a very small effect on inference instead of removing them. That is, adjusting the edge weights in an appropriate manner, is a better solution. This way, we can estimate the structure of the manifold more accurately. These shortcomings are overcome in the proposed algorithm, namely Isograph, which is described in the next section. C. An improved algorithm: Isograph We now propose the Isograph algorithm to overcome the shortcomings described in the previous section. This algorithm is a modified version of the baseline algorithm with two improvements: ∙ To overcome the problem with small values of 𝑑ℳ (𝑢, 𝑣), Isograph leaves all edges with 𝑑(𝑢, 𝑣) ≤ 𝜖 unchanged. If we choose 𝜖 such that 𝜖 > 2𝛿(ℳ), since we know 𝑑ℳ (𝑢, 𝑣) ≥ 𝑑(𝑢, 𝑣), then 𝑑ℳ (𝑢, 𝑣) > 2𝛿(ℳ), which solves the first problem. ∙ To overcome the second shortcoming, Isograph maintains an estimated value 𝑑ˆℳ (𝑢, 𝑣) for each edge (𝑢, 𝑣) ∈ 𝐸(𝐺) and if 𝑑ˆℳ (𝑢, 𝑣) is too far from 𝑑ℳ (𝑢, 𝑣), instead of removing this shortcut edge, it increases the edge weight, 𝑑ˆℳ (𝑢, 𝑣), to become a better estimation for geodesic distance. Therefore, the same graph structure is achieved with better edge weights which might result in updating other edges in the next iterations. In Theorem 3, we show that updating in multiple iterations will increase the edge weights and makes it more near to the geodesic distance and therefore a more accurate estimation of the geodesic distance is achieved. As previously mentioned in Theorem 1, it can be proven that for any edge (𝑢, 𝑣), which is detected as shortcut by the baseline algorithm, we have: 𝑑ℳ (𝑢, 𝑣) ≥ 2(𝑑(𝑢, 𝑣) − 𝛿(ℳ)) Therefore, we may use the following update rule for edge weights: 𝑑ˆℳ (𝑢, 𝑣) ← 2(𝑑(𝑢, 𝑣) − 𝛿(ℳ)) Later in Theorem 4, we will show that this is actually an appropriate updating rule which gives a better estimation of 𝑑ℳ (𝑢, 𝑣). Using the 𝜖 constraint, and updating the edge weights iteratively, using the mentioned updating rule, we come up with Isograph. Require: An initial graph 𝐺 built with a graph construction method (e.g. 𝑘-NN) Ensure: Adjusted edge weights: 𝑑ˆ𝑡ℳ (𝑢, 𝑣), ∀(𝑢, 𝑣) ∈ 𝐸(𝐺) 1: for all 𝑒 = (𝑢, 𝑣) ∈ 𝐸(𝐺) do (1) 2: 𝑑ˆℳ (𝑢, 𝑣) ← 𝑑(𝑢, 𝑣) 3: end for 4: for 𝑡 = 1 . . . 𝑛𝑢𝑚𝑏𝑒𝑟𝑂𝑓 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 do 5: for all 𝑒 = (𝑢, 𝑣) ∈ 𝐸(𝐺) ∖ 𝐸(MST(𝐺)) do (𝑡) 6: if 𝑑ˆℳ (𝑢, 𝑣) ≥ 𝜖 then 7: 𝐺𝑢,𝑣 ← the subgraph of 𝐺 with edge weights (𝑡) less than 𝑑ˆℳ (𝑢, 𝑣)2 8: 𝑙 ← length of shortest path in 𝐺𝑢,𝑣 between 𝑢 and 𝑣 9: if 𝑙 > 2 then ˆ𝑡 10: 𝑑ˆ𝑡+1 ℳ (𝑢, 𝑣) ← 2(𝑑ℳ (𝑢, 𝑣) − 𝛿(ℳ)) 11: end if 12: end if 13: end for 14: end for Algorithm 2: Isograph (The proposed algorithm) In the following theorems, we will prove that the following loop invariant holds throughout the procedure of Isograph: 𝑑(𝑢, 𝑣) ≤ 𝑑ˆ𝑡ℳ (𝑢, 𝑣) ≤ 𝑑ℳ (𝑢, 𝑣) (7) In addition, we show that the difference between the real and the estimated values of 𝑑ℳ (𝑢, 𝑣) decreases by updating edge weights in each iteration. The following theorems show that the estimated value of geodesic distance is always between the Euclidean distance and the real geodesic distance. Therefore, we may increase edge weights iteratively, without worrying about exceeding the true distance. ∀(𝑢, 𝑣) ∈ 𝐸(𝐺) : Theorem 3. Assuming the loop invariant (equation 7) holds at some time instance, if 𝑑ℳ (𝑢, 𝑣) < 2𝑑ˆℳ (𝑢, 𝑣) − 2𝛿(ℳ), (𝑡) 2 In fact we also add any (𝑥, 𝑦) ∈ / 𝐸(𝐺) such that 𝑑(𝑥, 𝑦) ≤ 𝑑ˆℳ (𝑢, 𝑣) to 𝐺𝑢,𝑣 then Isograph will not update edge 𝑒 = (𝑢, 𝑣) (line 10 of Algorithm 2). Theorem 4. At any point throughout the procedure of Isograph, equation 7 holds. The proof of these theorems is included in Appendix A. Lemma 1. If edge 𝑒 = (𝑢, 𝑣) is updated at iteration 𝑡 then ˆ𝑡 𝑑ˆ𝑡+1 ℳ (𝑢, 𝑣) > 𝑑ℳ (𝑢, 𝑣) (8) Proof: Edge 𝑒 is updated so: ˆ𝑡 𝑑ˆ𝑡+1 ℳ (𝑢, 𝑣) = 2(𝑑ℳ (𝑢, 𝑣) − 𝛿(ℳ)) We know that 𝑑ˆ𝑡ℳ (𝑢, 𝑣) ≥ 𝜖 > 2𝛿(ℳ). Adding 𝑑ˆ𝑡ℳ (𝑢, 𝑣) to both sides of this relation we have: 2𝑑ˆ𝑡ℳ (𝑢, 𝑣) > 2𝛿(ℳ) + 𝑑ˆ𝑡ℳ (𝑢, 𝑣) Therefore ⇒ 2(𝑑ˆ𝑡ℳ (𝑢, 𝑣) − 𝛿(ℳ)) > 𝑑ˆ𝑡ℳ (𝑢, 𝑣) 𝑑ˆ𝑡+1 (𝑢, 𝑣) > 𝑑ˆ𝑡 (𝑢, 𝑣) ℳ ℳ D. Practical modifications In Isograph, we have explicitly used 𝛿(ℳ) and 𝜖. From a sampling of the manifold ℳ, we can not exactly specify its underlying geometry. There are many manifolds passing through the same sample points, each having different 𝛿(ℳ) values, therefore computing 𝛿(ℳ) is naturally an ill-posed problem. However, Lemma 2 gives a lower bound for 𝛿(ℳ). Lemma 2. If the maximum edge in the minimum spanning tree(MST) of the neighbourhood graph has weight 𝑑𝑚𝑠𝑡 , then we have 𝛿(ℳ) ≥ 𝑑𝑚𝑠𝑡 2 , where 𝛿(ℳ) is defined in section V. The proof of this lemma will come in Appendix A. As (𝑑(𝑒)) Lemma 2 indicates, 𝛿(ℳ) ≥ 𝑚𝑎𝑥𝑒∈MST(G) . However 2 we argued that estimating 𝛿(ℳ) is an ill-posed problem, therefore in order to estimate 𝛿(ℳ), we must suppose that the data provided to our algorithms lies on a manifold with some intuitively reasonable constraints. i.e we must assume some prior knowledge about 𝛿(ℳ). Another issue is that in many cases the sampling might be sparse in just a small region of ℳ. As 𝛿(ℳ) is defined as a global parameter on the sampling, this results in a big value of 𝛿(ℳ), however the local value of 𝛿(ℳ) may be much smaller in many regions. In Theorem 3 where 𝛿(ℳ) entered our formulation, we do not need a global bound on 𝛿(ℳ), so we can use a local bound instead. We assume that the local value of 𝛿 has the same order of magnitude as 𝑑mst in Lemma 2 i.e. 𝛿 = 𝛼 𝑑2mst . Sparsity at some parts of the manifold can be more than other parts, and an adaptive method of estimating 𝛿, will clearly help Isograph. The 𝛼 𝑑2mst estimation of 𝛿 is not adaptive, therefore to overcome this problem we will try to solve it in an indirect manner. We already know that all edges in the 𝑘𝑙 -NN graph, where 𝑘𝑙 ≪ 𝑘 are rarely shortcut edges. Therefore its reasonable if we do not modify these edges at all. This showed to be effective in practice. VII. E XPERIMENTAL R ESULTS In this section we present the experimental results and demonstrate the effectiveness of our neighbourhood graph construction method when applied on the 𝑘-NN graph and on the output of b-matching. A. Synthetic Datasets To evaluate our proposed method, we generated three synthetic datasets. We are able to evaluate the effectiveness of our algorithm by illustrating the edges which are detected as shortcuts. Each dataset lies on a different 2D manifold shape embedded in a 3D space: Swiss roll, Step and Ellipsoid (Figure 6). The data points are generated (200 points) by a uniform i.i.d sampling on the manifold surface and each point is translated by a independent random offset. The 𝑘NN method is used to construct the neighbourhood graph where 𝑘 is selected as the smallest value for which a considerable number of shortcut edges emerge. The parameter 𝑘𝑙 (section VI-D), is set to 5 for all graphs. An effective shortcut edge detection algorithm eliminates the edges connecting two irrelevant points (i.e. edges with 𝑑ℳ ≫ 𝑑), while maintaining edges lying on the manifold, no matter how long the length of such edges may be. These properties are pursuant to Theorem 2 and 3 respectively. In these figures, it is easy to observe that our algorithm has both of these properties, and therefore is effective. For a better illustration, the graph edges of each manifold are partitioned and shown in two separate figures: edges which are detected as shortcuts, and those preserved. B. Real World Experiments In order to evaluate the proposed method, four standard datasets which are consistent with the Manifold assumption are selected which include MNIST, USPS, Caltech 101 and Corel. USPS and MNIST are digit recognition datasets and the others are image categorization datasets. For Caltech and Corel datasets, a subset of classes was selected and the CEDD feature set introduced by [16] was extracted; for MNIST and USPS the image is the feature vector itself (which is a low resolution image). Principal Component Analysis (PCA) is applied on all datasets for noise removal. For each dataset ten random sampling containing 2000 points of the whole data points were generated and Crossvalidation was used to partition the sampling into labeled and unlabeled points, such that there were ten labeled points for each class in average. The value of 𝛾, 𝑘𝑙 , 𝛼 and 𝑛𝑢𝑚𝑏𝑒𝑟𝑂𝑓 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 were selected as 0.02, 3, 0.5 and 3 respectively, for all experiments. In the first experiment, we have applied Isograph on the 10-NN graph. To illustrate the USPS Accuracy(%) Accuracy(%) MNIST 82 79 5 10 K 83 81 15 5 Caltech 10 K 15 Corel 72 Accuracy(%) 90 Accuracy(%) 0−1 kNN 0−1 kNN+Isograph kNN+ML kNN+ML+Isograph 88 5 10 K 15 70 5 10 K 15 Figure 8. Charts comparing the accuracy of Isograph applied on the 𝑘-NN with plain 𝑘-NN graph construction Figure 6. Shortcut detection in 𝑘-NN graphs of three noisy synthetic datasets: Ellipsoid (𝑘 = 20), Step (𝑘 = 22) and Swiss roll (𝑘 = 13). Figures on the right column illustrate the edges detected as shortcuts, and therefore updated by Isograph algorithm, and edges on the left column are those maintained. Figure 7. Shortcut edges detected by Isograph and the path found by the algorithm effectiveness of Isograph in detecting the shortcut edges, we have selected some of the updated edges and plotted the path found by Isograph between the endpoints (Figure 7). The first and the last pictures in each row represent the endpoints of an edge. Although this edge was in the 10-NN, the endpoints belong to two different classes. Therefore, our algorithm improves the graph structure by updating the edge between them. In the second experiment we applied Isograph on the 𝑘NN graph and measured the accuracy of the classifier built using the resulting neighbourhood graph. The results are presented in Figure 8 for the mentioned datasets. We have run our algorithm in two settings: 1) Binary: In this setting we only use unit weights for edges. The 𝑘-NN approach to graph construction in the binary setting is to connect each vertex to the 𝑘 nearest neighbours with unit weight. We call this the “0-1 𝑘-NN graph” to discriminate it from the weighted 𝑘-NN graph. Isograph can be applied in binary graph construction by running Isograph in the following way: We build the weighted 𝑘-NN graph, run Isograph on this graph, and remove any edges from the graph that are updated by Isograph, after all iterations have finished. We name the result as “0-1 𝑘-NN+Isograph”. Note that this is different from using the baseline algorithm. Edges are not removed in Isograph so they can influence on estimating geodesic distance of other edges; Hence, potentially more shortcut edges are updated. 2) Weighted: We compare Isograph with the “𝑘NN+ML” graph in this setting. The 𝑘-NN+ML graph is constructed by creating the weighted 𝑘-NN graph using the similarity Equation 2. To build the “𝑘-NN+ML+Isograph” we applied Isograph on the weighted 𝑘-NN and used Equation 2. In both weighted methods, Marginal Likelihood (ML) was used to find the best 𝜎 for creating the similarity matrix W. All of four graph constructions above can be redone by using any arbitrary graph construction method instead of 𝑘-NN method. For instance, we combined Isograph with 𝑏-matching and showed that the classification accuracy is superior to plain 𝑏-matching on every four datasets that mentioned before (Figure 9). USPS Accuracy(%) Accuracy(%) MNIST 78 75 5 10 K 77 Figure 10. The shortest curve lying on ℳ connecting 𝑢 and 𝑣. 𝑚 is the midpoint of the curve. 74 15 5 10 K 15 Corel Caltech 71 Accuracy(%) 81 Accuracy(%) 0−1 bMatching 0−1 bMatching+Isograph bMatching+ML bMatching+ML+Isograph 77 5 10 K 15 70 5 10 K 15 Figure 9. 𝑏-matching is combined with Isograph in the 0-1 and ML setting These figures show steady improvement of Isograph in all the settings presented and the improvements are robust to 𝑘. As we can see, in MNIST and USPS, 𝑘-NN+Isograph considerably improved the results, This shows that Isograph detects the shortcut edges perfectly. In these two datasets 𝑘-NN+ML+Isograph works better for small values of 𝑘, however performance slightly degrades for larger amounts of k. This phenomenon can be explained by the fact that for small values of 𝑘, the 𝑘-NN graph uses shorter edges, that probably have smaller difference between their 𝑑 and 𝑑ℳ , therefore a maximum of three iterations is enough to reach their correct weight. However when 𝑘 increases in spite of the fact that we detect shortcut edges correctly, we will not update the weight of edges in an appropriate number of iterations, as the change in each iteration is limited to a factor of nearly two. On the Corel dataset, ML improves the results of 0-1 𝑘NN and 0-1 𝑘-NN+Isograph. In this setting, weights have an important role in inferencing labels correctly. Therefore, the difference between weighted and 0-1 is considerable. In contrast on the Caltech, ML weighting has not a positive effect, therefore, we see the best results in 0-1 𝑘-NN+Isograph. This might be due to the possibility of non-equal number of labeled data from each class, however note that Isograph has still improved the 0-1 graph. Furthermore, we combined Isograph with 𝑏-matching and showed that the classification accuracy is superior to plain 𝑏-matching on all datasets with robustness w.r.t to the parameter 𝑘 (Figure 9). VIII. C ONCLUSIONS In this paper, we showed that using geodesic distance instead of Euclidean distance will improve the neighbourhood graph. Therefore, we proposed an unsupervised method (Isograph) to estimate the geodesic distance between points. We have provided bounds on the values of the geodesics estimated by Isograph. As Isograph can be combined with other graph construction methods, we combined it with 𝑘-NN and 𝑏-matching and presented the results on realworld datasets, which show steady effectiveness of Isograph. The effectiveness of using geodesic distance in the graph construction procedure and convergance of the Isograph algorithm are subject of future theoretical analysis. Better local estimation of 𝛿 may lead to better geodesic distance estimation. Furthermore labeled data may be employed to improve the shortcut detection procedure. IX. A PPENDIX A: P ROOF OF SOME OF THE THEOREMS Theorem 3. Assuming the loop invariant (Equation 7) holds at some time instance, if 𝑑ℳ (𝑢, 𝑣) < 2𝑑ˆℳ (𝑢, 𝑣) − 2𝛿(ℳ), then Isograph will preserve edge 𝑒 = (𝑢, 𝑣). Proof: Consider a shortest curve 𝐶 on ℳ starting from u and ending at v (Figure 10). Let m be the midpoint of curve 𝐶, that is the point that halves the length of 𝐶. From the sampling condition we know that there exists a point 𝑤 in the sampling such that 𝑑ℳ (𝑚, 𝑤) ≤ 𝛿(ℳ). We first want to show that 𝑤 can not coincide any of 𝑢 or 𝑣. From the loop invariant assumption we have: 𝑑ˆℳ (𝑢, 𝑣) ≤ 𝑑ℳ (𝑢, 𝑣). As we had assumed 𝑑ℳ (𝑢,𝑣) +𝛿(ℳ) < 𝑑ˆℳ (𝑢, 𝑣), 2 we get 𝑑ℳ (𝑢, 𝑣) = 𝑑ℳ (𝑢, 𝑚) = 𝑑ℳ (𝑚, 𝑣) 2 This means that 𝑤 can not be any of 𝑢 or 𝑣 because 𝑑ℳ (𝑚, 𝑤) ≤ 𝛿(ℳ). Now by the triangle inequality we have: 𝛿(ℳ) < 𝑑ℳ (𝑢, 𝑤) ≤ 𝑑ℳ (𝑢, 𝑚) + 𝑑ℳ (𝑚, 𝑤) By adding 𝑑ℳ (𝑢, 𝑚) = we get 𝑑ℳ (𝑢,𝑣) 2 𝑑ℳ (𝑢, 𝑚) + 𝑑ℳ (𝑚, 𝑤) ≤ and 𝑑ℳ (𝑚, 𝑤) ≤ 𝛿(ℳ) 𝑑ℳ (𝑢, 𝑣) + 𝛿(ℳ) 2 So we have 𝑑ℳ (𝑢, 𝑤) ≤ 𝑑ℳ (𝑢, 𝑣) + 𝛿(ℳ) 2 Finally plugging the last inequality in the assumption that 𝑑ℳ (𝑢,𝑣) + 𝛿(ℳ) < 𝑑ˆℳ (𝑢, 𝑣) we reach 2 𝑑ˆℳ (𝑢, 𝑤) ≤ 𝑑ℳ (𝑢, 𝑤) < 𝑑ˆℳ (𝑢, 𝑣) In a similar way we have 𝑑ˆℳ (𝑣, 𝑤) < 𝑑ˆℳ (𝑢, 𝑣). Therefore, edges (𝑢, 𝑤) and (𝑣, 𝑤) are both in 𝐸(𝐺𝑢,𝑣 ) and Isograph will preserve edge (𝑢, 𝑣) due to point 𝑤. Theorem 4. At any point throughout the procedure of Isograph, Equation 7 holds. Proof: We show that Equation 7 is a loop invariant, that is we must show that: 1) Equation 7 is true when initializing 𝑑ˆℳ at the beginning of the algorithm. 2) With the assumption that Equation 7 holds at some time instance , it still holds after updating an edge. (1) Item one is true because 𝑑ˆℳ = 𝑑(𝑢, 𝑣), and by Proposition 1 we have 𝑑(𝑢, 𝑣) ≤ 𝑑ℳ (𝑢, 𝑣). We now prove item two. Suppose at some time 𝑡 the loop invariant holds. We should show that 𝑑ˆ𝑡+1 ℳ (𝑢, 𝑣) ≤ 𝑑ℳ (𝑢, 𝑣). According to theorem 3, if an edge is updated we must have: + 𝛿(ℳ), so 𝑑ˆ𝑡ℳ (𝑢, 𝑣) ≤ 𝑑ℳ (𝑢,𝑣) 2 ˆ𝑡 𝑑ˆ𝑡+1 ℳ (𝑢, 𝑣) = 2(𝑑ℳ (𝑢, 𝑣) − 𝛿(ℳ)) ≤ 𝑑ℳ (𝑢, 𝑣) Lemma 2. If the maximum edge in the minimum spanning tree (MST) of the neighbourhood graph has weight 𝑑mst , we have 𝛿(ℳ) ≥ 𝑑2mst , where 𝛿(ℳ) is defined in section V. Proof: Let (𝑢, 𝑣) be the edge with maximum weight in the MST. Suppose that removing edge (𝑢, 𝑣) results in two connected components 𝐶1 and 𝐶2 . Define 𝑑ℳ (𝑥, 𝐶1 ) to be the minimum distance of point 𝑥 to points in 𝐶1 i.e. 𝑑ℳ (𝑥, 𝐶1 ) = 𝑚𝑖𝑛𝑦∈𝐶1 𝑑ℳ (𝑥, 𝑦). 𝑑ℳ (𝑥, 𝐶2 ) is defined in a similar way. Now, let 𝐶 be any curve between 𝑢 and 𝑣. For any point 𝑥 on this curve we compute 𝑓 (𝑥) = 𝑑ℳ (𝑥, 𝐶1 ) − 𝑑ℳ (𝑥, 𝐶2 ). We know 𝑓 (𝑢) < 0 and 𝑓 (𝑣) > 0 and f is continuous, so by the intermediate value theorem, there exists a point 𝑥∗ on curve 𝐶 such that 𝑓 (𝑥∗ ) = 0. Suppose that 𝑥1 be the point from 𝐶1 that has minimum distance from 𝑥∗ and 𝑥2 is defined in a similar way. By definition 𝛿(ℳ) ≥ 𝑑ℳ (𝑥∗ , 𝑥1 ) = 𝑑ℳ (𝑥∗ , 𝑥2 ), so we have 2𝛿(ℳ) ≥ 𝑑ℳ (𝑥∗ , 𝑥1 ) + 𝑑ℳ (𝑥∗ , 𝑥2 ) ≥ 𝑑ℳ (𝑥1 , 𝑥2 ) ≥ 𝑑ˆℳ (𝑥1 , 𝑥2 ) As (𝑢, 𝑣) is an edge in MST, 𝑑ˆℳ (𝑢, 𝑣) is the shortest edge between any vertex in 𝐶1 to some arbitrary vertex in 𝐶2 . Therefore 2𝛿(ℳ) ≥ 𝑑ˆℳ (𝑥1 , 𝑥2 ) ≥ 𝑑ˆℳ (𝑢, 𝑣) = 𝑑𝑚𝑠𝑡 R EFERENCES [1] R. Ando and T. Zhang, “A high-performance semi-supervised learning method for text chunking,” in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 1–9, 2005. [2] S. Basu, M. Bilenko, and R. Mooney, “A probabilistic framework for semi-supervised clustering,” in Proceedings of the tenth ACM international conference on Knowledge discovery and data mining, pp. 59–68, 2004. [3] S. Hoi and M. Lyu, “A semi-supervised active learning framework for image retrieval,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 302–309, 2005. [4] O. Chapelle, B. Schölkopf, and A. Zien, Semi-supervised learning, vol. 2. MIT press Cambridge, MA, 2006. [5] M. Belkin and P. Niyogi, “Semi-supervised learning on riemannian manifolds,” Machine Learning, vol. 56, no. 1, pp. 209–239, 2004. [6] X. Zhu, J. Lafferty, and R. Rosenfeld, Semi-supervised learning with graphs. PhD thesis, 2005. [7] T. Jebara, J. Wang, and S. Chang, “Graph construction and bmatching for semi-supervised learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 441–448, ACM, 2009. [8] B. Huang and T. Jebara, “Loopy belief propagation for bipartite maximum weight b-matching,” Artificial Intelligence and Statistics, 2007. [9] W. Cukierski and D. Foran, “Using betweenness centrality to identify manifold shortcuts,” in IEEE International Conference on Data Mining Workshops, pp. 949–958, 2008. [10] J. Tenenbaum, V. Silva, and J. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, p. 2319, 2000. [11] M. Belkin and P. Niyogi, “Problems of learning on manifolds,” The University of Chicago, 2003. [12] M. Hein and U. von Luxburg, “Introduction to graph-based semi-supervised learning,” [13] J. Odegard, “Dimensionality reduction methods for molecular motion,” [14] M. Bernstein, V. De Silva, J. Langford, and J. Tenenbaum, “Graph approximations to geodesics on embedded manifolds,” tech. rep., Technical report, Department of Psychology, Stanford University, 2000. [15] M. Do Carmo, Riemannian geometry. Birkhauser, 1992. [16] Y. Chen and J. Wang, “Image categorization by learning and reasoning with regions,” The Journal of Machine Learning Research, vol. 5, pp. 913–939, 2004.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Isograph: Neighbourhood Graph Construction Based On Geodesic Distance For Semi-Supervised Learning