Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A clustering-based visualization of spatial patterns Nazha Selmaoui-Folcher, Frédéric Flouvat, Elise Desmier and Dominique Gay University of New Caledonia, PPME-ERIM, F-98851, Noumea, New Caledonia [email protected], [email protected], [email protected], [email protected] March 9, 2010 Abstract mining techniques may be applied to extract spatial interesting patterns such as association rules [16] or emerging patterns [6] ; see also [3, 17]. On the other hand, [12] identified two approaches for colocation mining: transaction-based approaches and eventbased approaches. Transaction-based approaches focus on transforming spatial data into transactional data where classical itemset mining algorithms could be used [14, 4]. In [14], authors presented an efficient method for mining association rules in geographic information databases. This method enumerates neighbors to ”materialize” a set of transactions around instances of the reference spatial feature. The goal is to find colocations of relevant features to the reference feature. [4] extends this work by introducing knowledge constraints in a preprocessing step. The main limit of these works is that spatial relationships and features are only partially considered. To cope with this limit, event-based approaches focus on the event and their neighbor relationships [20, 12, 21]. Shekhar and al. have defined the colocation concept based on Koperski’s work. The goal is to find all subsets of spatial features likely to occur together. To filter interesting colocations, two interestingness measure have been proposed. Thanks to the anti-monotonic property of these predicates, a levelwise algorithm has been used to extract interesting colocations. Thus, this approach consider all the features together and original data are not transformed. Extraction of interesting colocations in georeferenced data is one of the major tasks in spatial pattern mining. Considering a set of spatial Boolean features, the goal is to find relevant subsets of features associated with objects often located together. In this context, the main drawback is the interpretation of extracted patterns by domain experts. Indeed, common textual representation of colocations loses important spatial information. To overcome this problem, we propose a new clustering-based visualization technique deeply integrated in the colocation algorithm. This new simple, concise and intuitive cartographic representation consider both spatial information and experts practice. The whole process has been experimented on a real-world geological data set and the addedvalue of the method confirmed by domain experts. 1 Introduction Spatial data mining refers to the extraction of interesting, useful, unexpected and implicit knowledge in spatial data. It has wide applications in environmental management, public safety, transportation or tourism. One of the classical task in spatial pattern mining is the extraction of interesting colocations in geo-referenced data [14, 20, 3, 12, 16, 4, 21, 6, 7, 17]. To deal with this problem, two families of spatial pattern mining approaches may be identified : multi-relational approaches and colocationHowever, a major problem with these spatial patbased approaches. When spatial data is made of tern mining techniques is the interpretation of the various tables describing objects and spatial re- results by domain experts. Actually, extracted patlationships between objects, multi-relational data terns are presented in a textual form, which is not 1 a representation that can be easily understood and directly usable by experts. Moreover, a textual representation considers only partially the spatial informations of the underlying objects. Indeed, experts can only know which features are generally located together, but they don’t have any informations on where these colocations are generally located and their configuration. In this context, we propose a new visualization of colocations based on clustering. This solution leads to a simple, concise and intuitive cartographic visualization of colocations, and takes into consideration the spatial nature of the underlying objects and the experts practice. Finally, this proposition has been integrated in a prototype with a Geographic Information Systems (GIS). Experiments have been done on a real geological dataset and validated by a domain expert. Section 2 presents related works on the interpretation and the visualization of data mining results. Section 3 presents the colocation mining problem. Section 4 introduces our work to deliver actionable knowledge to domain experts, i.e. a new spatial representation of colocations. Section 5 presents some experiments on a real geological dataset. Finally, section 6 concludes and gives some perspectives. 2 Figure 1: a) A Rule visualizer view of supermarket items. b) Visualization of an association rule for South America. c)A snapshot of the proposed WiFIsViz. the type of result to visualize. Each mining algorithm (such as simple-Bayes model, decision tree, or association rules) is coupled with a visualization tool in order to help users in their interpretation of the learned models. Figure 1-a shows the visualization of association rules. Recently, [15] deals with the visualization of frequent itemsets. The authors developed a system, WiFIsViz, for visualizing frequent itemsets based on orthogonal graphs (wiring-type diagrams). Frequent itemsets are shown in a two-dimensional space, where the x-axis shows items and the y-axis shows the frequencies (figure 1-c). An itemset X is represented by a horizontal line connecting nodes, where each node represents an item of X. Moreover, itemsets sharing the same prefix are merged, which improves the visualization. The visualizer provides different levels of details to represent frequent itemsets. It also integrates features for constrained itemset mining. For spatial data, a typical system is the one proposed in[2]. Authors were interested in representing spatial data and providing a visualization of classical data mining results on spatial data. The Visualization in data mining: related works One of the major issues in data mining is the representation of the discovered knowledge such as it can be easily understood and directly usable by experts [11]. Nevertheless, most of data mining methods return results in a textual form based on an interestingness measure. To our knowledge, no solutions have been proposed specifically for the visualization of colocation patterns. However, several visualization systems have been proposed for classical data mining tasks or for spatial data. In the rest of this section, we describe the main approaches. For classical data mining, several systems have been developed to represent raw data or mining results [5, 13]. For example, MineSet [5] is an interactive system for data mining integrating data visualization. Different kinds of visualizer (statistics, scatter, map, tree) are available according to 2 clidean distance with a threshold, two spatial objects are neighbors if their distance is lower than the given threshold. A colocation instance I ⊆ O of a colocation C is a set of objects, such that the objects are instances of all the features of C and form a clique relationship w.r.t. R. As a consequence, a colocation instance of a colocation C satisfies the following property: • |{f ∈ C | o ∈ I and Θ(o) = f }| = |C| presentation of subgroups or clusters is naturally presented on maps using painting or icons on spatial objects. The same technique is applied to decision trees and classification rules by associating a visual feature to objects. For non-geographical informations such as mined trees or rules, the system makes a dynamic link between the map and reports. For example, when a cursor is positioned on a tree node or a rule in a report, the corresponding instances are highlighted in the map (and vice versa). Figure1-b illustrates an application of this system for the visualization of rules for South America. As far as we know, none of the solutions proposed in the literature were designed to display spatial patterns in a simple, concise and intuitive way for experts. They do not take into consideration the spatial nature of the underlying objects, and only provide non spatial knowledge. 3 The colocation framework • |I| = |C| • ∀o, p ∈ I, R(o, p) = true The figure 2 shows that the set of objects {A9 , B4 , D10 } is a colocation instance of the colocation {A, B, D} w.r.t. to a fixed Euclidean distance threshold (represented by dotted circles). To the opposite, {A1 , B4 , C7 }, {A1 , B4 , D10 } or {A9 , B4 } are not colocation instances of {A, B, D}. However, not every colocation is interesting. There is only one set of three neighbor objects having the features A, B and D. Thus, we need other concepts to determine the interestingness of a colocation. In this paper, to simplify, we use the term ”instance” to refer to a ”colocation instance”, and represent the colocation {A, B, D} by ABD in the figures. A table instance of a colocation C, denoted T IC , is the set of all its colocation instances. The table instance of {A, B, C} is T I{A,B,C} = {{A1 , B8 , C7 }, {A5 , B6 , C2 }} and the table instance of {B, D} is T I{B,D} = {{B4 , D10 }} (see figure 2). More formally, we have mining This section recall the colocation framework proposed in [20, 12, 21]. Let F be a set of boolean features, O be a set of spatial objects, and R be a neighbor relationship over O. An instance of a feature f ∈ F is an object of O having the feature f . We define the function Θ : O → F to formally define the association between objects and features. For example, in figure 2, F = {A, B, C, D, E}, O = {A1 , C2 , B3 ..., E12 }, and A9 is an instance of feature A, i.e. Θ(A9 ) = A. Note that, in this paper, spatial objects are represented by points in a two dimensional space. T IC = {I ⊆ O | I is an instance of C w.r.t. R} The participation ratio pr(C, f ) for a feature f of a colocation C, is the fraction of objects of a feature f included in the instances of C, to the total number of objects of a feature f . pr(C, f ) = |{o ∈ I | I ∈ T IC and Θ(o) = f }| |T I{f } | In figure 2, pr({A, B, C}, A) = 2/3, pr({A, B, C}, B) = 1/2 and pr({A, B, C}, C) = 1. Based on the definitions above, [12] has proA colocation C ⊆ F is a set of features, whose posed the concept of participation index, deinstances form a clique using a neighbor relation- noted pi(C), to estimate the frequency of colocaship R. If the neighbor relationship R is the Eu- tion C. More precisely, it represents the minimal Figure 2: Spatial objects and their features 3 probability to have an object in an instance of the colocation C w.r.t. all objects having this feature. pi(C) = min ( pr(C, f )) ∀f ∈C Based on these definitions, the problem to solve is : Colocation mining problem. Given F a set of features, O a set of spatial objects, R a neighbor relationship and α ∈ [0, 1] a threshold. The problem is to find the set of colocations {C ⊆ F | pi(C) ≥ α} 4 A spatial visualization of colocations integrated in a GIS The visualization of data mining results is essential to have actionable domain knowledge. In domains manipulating geographical data, GIS are classical tools for storing and visualizing spatial data. A main characteristic of GIS is the cartographic visualization of the information in thematic layers. In this context, our objective is to find a spatial visualization of the colocation mining results integrated in the GIS. However, the potential high number of colocation instances may lead to an unreadable map, and colocations in a textual form loses the spatial informations of their objects. To deal with these problems, we propose a new cartographic visualization of colocations in a GIS. Figure 3: Principle of our approach The principle of our approach (figure 3) is two-step : a) extract colocations patterns using classical colocation mining algorithm two spatial representations, which shows that this colocation is generally located in the center and in b) use the table instance of each colocation C to the north-east of the area. construct spatial representations of C Note that our visualization approach also inteThese spatial representations allow to see where grates thematic aspects by painting each feature in and how the colocation is generally located. Basi- the color of the corresponding theme. In the same cally, a spatial representation of a colocation way, the intensity of the links color is proportional C is a set of points, each one representing a fea- to the value of the participation index associated ture of C, and linked together by lines. The lines to the colocation. between the points represent the neighbor relationship. In other words, a spatial representation of 4.1 A first cartographic representaa colocation is a clique spatially positioned on a tion of colocations map. The position of each point of the clique depends on the position of the colocation instances. Firstly, we consider a very simple approach to conFor example, on figure 3, the colocation {E, C} has struct spatial representations of colocations. It con4 4.2 sists in constructing, for each feature f of a colocation C, the centroid of the objects of feature f included in an instance of C (figure 4). In other words, for each feature f ∈ C, the visual representation of f is a spatial object of such that: P ∀o=(x,y)∈Ωf,C x , of = (xf , yf ), with xf = |Ωf,C | P ∀o=(x,y)∈Ωf,C y yf = |Ωf,C | and Θ(of ) = f A clustering-based spatial representation of colocations When instances of a colocation are not spatially located in a single location, there should be several spatial representations for such colocation to represent these different spatial distributions. In this context, our proposition is to combine a clustering method with the colocation mining algorithm (figure 5). More precisely, instead of processing centroids based on the whole table instance of a colocation C, we partition this table instance in several clusters based on their spatial coordinates (figure 5 step 1.clustering). Then, each partition (representing a typical location of instances of C) is used to construct a spatial representation of C based on the centroids method described in the previous subsection (figure 5 step 2.centroids). The algorithm 1 illustrates the details of our method. The main part of the algorithm corresponds to the levelwise colocation mining algorithm proposed in [12], only lines 2-4 and 9-11 correspond to the construction of the spatial clustering-based colocation patterns. where Ωf,C = {o ∈ I | I ∈ T IC and Θ(o) = f } As a consequence, each feature f of a colocation C is represented by a single spatial object (i.e. a point) in the map. This object corresponds to the ”average” location of feature f in the instances of C. Thus, each colocation is represented by a single clique corresponding to the ”average” location and configuration of its instances. Figure 4 illustrates the construction of the spatial representations based on the centroid approach (step b of our approach illustrated in figure 3). However, this method leads to an interpretation problem when the spatial representation of the colocation is located in the middle of the studied area. Indeed, instances of such colocations can be located either in the middle of the area or uniformly distributed all over the area. Moreover, in practice, instances of a colocation are rarely grouped in a single location. Instead, they may be several locations where the colocation frequently appears. In such cases, this method will construct an ”average” spatial representation which is not necessarily meaningful for the expert. Figures 3 and 4 illustrate this problem with colocation {E, C}. In figure 3 (top figure), instances of this colocation are frequently located in the central region and in the north-east region. In figure 4 (right figure), its spatial representation using the previous method is located between these two regions which can be misinterpreted by experts. Nonetheless, note that this spatial representation may give an interesting information to experts: such relation generally doesn’t occur in the south or in the east of the studied area. A solution to deal with this problem is to use clustering in order to have several spatial representations of a colocation w.r.t. the locations of its objects, and thus to have a finer interpretation of the spatial distribution of colocations. Algorithm 1 Spatial clustering-based colocation mining algorithm Require: a set of spatial objects O, a set of features F , a boolean spatial relationship R, the participation index threshold α Ensure: the spatial representations of interesting colocations 1: Cand1 = F ; k = 1 2: for all f ∈ F do 3: CFf = clusterObjectsFeature(O, f ) 4: end for 5: while Candk 6= ∅ do 6: for all C ∈ Candk do 7: T IC = generateT ableInstance(O, C) 8: if pi(C) ≥ α then S CFf ) 9: for all cluster ∈ clusterTIColoc(TIC , ∀f ∈C do 10: Spatial ColocC = generateCentroidsColoc(cluster, C) 11: end for 12: Interest Colock = Interest Colock ∪ {C} 13: end if 14: end for 15: Candk+1 = {X ⊆S F | ∀ Y ⊂ X, Y ∈ Interest Colock }\ Candj j≤k 16: k =k+1 17: end while S 18: Return ∀C∈S 0<i<k Interest Coloci Spatial ColocC The levelwise strategy proposed in [20] for colocation mining is based on the classical Apriori algorithm [1]. Note that a generalization of the Apriori algorithm is also described in [18]. The principle 5 Figure 4: Simple representation using centroids of table instance objects Figure 5: Clustering-based representation using table instance objects 2 to 4), run once at the beginning of the algorithm. Let CFf be the set of clusters obtained with the objects of feature f . of this strategy is to iteratively generate a set of candidate colocations of size k + 1 (i.e. colocations having k+1 features), denoted Candk+1 , from the set of interesting colocations of size k, denoted Interest Colock , and to test their corresponding participation index. Thus, this approach alternates candidate generation and evaluation phases. The candidate generation is done in line 15 based on interesting collocations of size k. For each candidate colocation generated, the evaluation phases is done in line 8, using the table instance processed in line 7. For the construction of the spatial representations, the most simple solution would have been to execute a clustering algorithm on the table instance of each colocation. However, this solution would have been time consuming, considering you may have thousands of colocations. Therefore, we develop a two-step clustering approach integrated in the mining algorithm. The two steps are: • a clustering of each colocation table instance based on the clusters of each feature, using a merge and split approach (line 9). First, for each feature f , we partition the objects having feature f based on their coordinates (line 2 to 4), using the X-means clustering algorithm [19] implemented in Weka [10]. Then, we use these clusters of objects as a basis for the clustering of each table instance of an interesting colocation C (function clusterT IColoc, line 9). Finally, for each cluster of instances generated, the function generateCentroidsColoc (line 10) constructs the corresponding spatial representation of C based on the centroids of the objects of each feature. This approach is illustrated for one interesting colocation in the example of figure 6. More precisely, the function clusterT IColoc processes the table instance using a merge and split • a clustering of the objects of each feature (line 6 Figure 6: Example of construction of the visual representations of colocation {X,Y,Z} using the merge and split approach would be in different clusters, whereas they share the object Y2 . To avoid this problem, our solution is to iteratively merge the clusters of the feature having the highest number of clusters, and finally split the table instance w.r.t. these clusters when nothing can be merged anymore (stability condition). This method is illustrated in figure 6. Given two features f and g for a colocation C, we have two situations: approach. The principle of this method is to select the feature f having the highest number of clusters, and to split the instances of C w.r.t. to the clusters of f . However, two instances in two different partitions w.r.t clusters of f can have in common an object of an other feature of C. In those cases, we have conflictual clusters, i.e. objects belonging to several partitions. For example, in figure 6-a, if we partition colocation instances w.r.t. clusters of Z, the second and third instances of {X, Y, Z} • suppose that two instances are in different par7 titions w.r.t. clusters of f , but have in common an object of g. We merge the two clusters of f leading to such partitioning (as a consequence these clusters will not be conflictual anymore). For example, in figure 6-c, the second and third instances belong to different partitioning w.r.t. clusters of X, but they have in common the object Y2 . Consequently, the two conflicting clusters of X are merged (figure 6-d). side) shows that colocations {A, B, C} is generally located in the north west of the map, {A, B, E, F } in the south-west and {A, B, D} in the south-east. Thus, our approach has the advantage to provide to experts a global picture of the spatial distribution of the colocations. Using a classical visualization approach, it would have been more difficult to have such informations. Moreover, our approach also allows to visualize with precision how the features of a colocation are w.r.t. to each others. For example, figure 3 shows that the instances of colocation {A, B, D} are generally closer than the ones of colocation {A, B, C}. In the same way, the spatial representation of {A, B, D} shows that objects having feature B are generally below the ones having features A and D, and objects having feature D are generally located on the left of the ones having feature A. Furthermore, note that experts can easily visualize the importance of a colocation and its themes thanks to the color system. Finally, this visualization approach do not require additional post processing step, since it is done during the mining algorithm using table instances processed for colocation mining. • suppose that two instances are in different partitions w.r.t. clusters of f , but have objects belonging to the same cluster of g. We split the corresponding cluster of g. For example (figure 6-e), the fourth and fifth instances belong to different partitioning w.r.t. clusters of Y , but they include the objects X4 and X5 which are in the same cluster of X. Consequently, we split the corresponding cluster of X w.r.t. clusters of Y (figure 6-f). Note that the interpretation may be difficult if lot of spatial patterns are generated. The zoom functionality of the GIS partially solves this problem, but in some cases it may not be enough. To deal with this problem, the user can choose to extract a condensed representation of the interesting colocations, i.e. a subset of colocations representing the solutions. Thus, our system also proposes the extraction and visualization of maximal interesting colocations w.r.t. set inclusion (also called the positive border in [18]), instead of all interesting colocations (see [18, 9] for more details). 4.3 5 Application The proposals discussed in this paper have been integrated in a prototype coupled with a GIS (figure 7). This prototype is based on a data mining tool called iZi [8]. This tool is used to solve interesting pattern mining problems as defined in the formal framework of [18], by providing generic algorithm implementations. This tool has been extended to process spatial clustering-based colocations patterns and to store them in a PostGis geographical database. Quantum GIS (a free desktop application framework) is used as an interface to visualize data and colocations stored in the GIS. We used our prototype to study soil erosion on a mountainous area of 9km2 in New Caledonia. In this area, natural erosion takes place as well as erosion related to mining activities. When studying soil erosion, three important thematic layers were considered: soil erosion (6 features), nature of the ground (13 features), and vegetation (13 features). This dataset is composed of more than 9000 objects. The studied objects resulted from vector data of a geographical database. The spatial relation- Advantages of our proposition This visualization approach has three main advantages w.r.t. existing solutions. Firstly, we get a spatial visualization of colocations totally integrated in the GIS, and thus adapted to experts needs and practices. The original data is not affected by our approach, only an additional layer is added. Moreover, it can take advantage of the GIS functionalities. For example, the user can zoom on the map in order to have either a general view of all the colocations (figure 9, in the middle), or a detailed view of one or several colocations (figure 9 on the right). Secondly, this representation gives additional informations on the colocations. Actually, it allows to visualize where and how an interesting colocation is spatially located. For example, figure 5 (right 8 Minimum distance 200m V0 Centroids Clustering Total Time (sec) 10000 1000 100 0.5 0.4 0.3 0.2 0.1 0 Minimum participation index Minimum distance 300m Total Time (sec) 10000 Figure 7: Architecture of the prototype V0 Centroids Clustering 1000 ship studied was a neighbor relationship based on a distance threshold between the centroids of the 100 0.5 0.4 0.3 0.2 0.1 0 areas. Minimum participation index Figure 8 shows the performances of colocation mining (V 0), colocation mining with the centroid visualization (Centroids) and colocations mining Figure 8: Performances of the different approaches with clustering-based visualization (Clustering). As shown by this figure, performances are acceptable for experts (same order of magnitude) w.r.t. the value-added informations provided, especially if we take into consideration that such data is rarely updated. Actually, most of the additional processDistance Participation index threshold 0.5 0.3 0.1 ing time is due to the non-optimized implementa200m nb colocations 21 68 266 tion of our prototype. Indeed, in this first work, we avg nb instances 16 478 11 974 8 365 for a colocation focus more on results to demonstrate the interest total nb instances 346 046 814 263 2 225 118 of this approach, than on performances. For exfor all colocations nb spatial 31 112 510 ample, the top plot (minimum participation index representations equal to 0.5) shows the cost of the weka invocation 300m nb colocations 55 163 711 avg nb instances 50 803 78 347 87 100 for the first clustering on features. Indeed, part of for a colocation the runtime is due to external calls to Weka using total nb instances 2 794 205 12 770 670 61 928 727 for all colocations intermediate files. In the bottom plot, the differnb spatial 84 258 1349 ence between V 0 and Centroids shows that most representations of the processing time is not due to the clustering steps, but to the data access in the GIS. Actually, Table 1: Number of colocations and spatial SQL queries and database parameters are not opclustering-based patterns timized in this version of our prototype. Table 1 shows the number of colocation for different distance and participation index thresholds, 9 and the corresponding number of spatial clusteringbased colocation patterns. In average, the number of spatial representations is no more than twice the number of colocations. The average number of instances for a colocation represents the number of patterns that would have been displayed in the map using a classical visualization approach such as in [2], i.e. selection of a colocation in a report and display of the corresponding instances on the map. The total number of instances for all colocations represents the number of patterns on the map if we display all the instances of all interesting colocations at the same time. These two indicators illustrate the interest of our approach, since the number of patterns displayed using our solution is much lower than the two others. 6 Conclusion In this paper, we propose a clustering-based method for the visualization of colocation patterns. The visualization method extends the colocation concept with spatial informations and is deeply integrated in the colocation mining algorithm. Moreover, the cartographic representation of these patterns better fits with experts practice. The whole process has been successfully integrated in a prototype based on PostGIS. To our knowledge, existing visualization approaches does not have these advantages. Finally, we validated our method through experiments on a real-world geological dataset. The analysis of experimental results by domain experts has confirmed the addedvalue of the method. Acknowledgments. The authors wish to thank The visualization of the spatial clustering-based Isabelle Rouet, geologist and expert in soil erosion, representations for one of these experiments is prefor providing the data and validating the results. sented in figure 9. We can see the spatial objects (left screenshot), their corresponding spatial clustering-based colocations (screenshot in the cenReferences ter), and a zoom on a specific area (right screenshot). This figure illustrates the advantage of our [1] R. Agrawal and R. Srikant. Fast algorithms for approach by providing to experts a global picture mining association rules in large databases. In on where and how the colocations are generally loJ. B. Bocca, M. Jarke, and C. Zaniolo, edcated. It also shows how experts can use the zoom itors, VLDB, pages 487–499. Morgan Kauffunctionality of the GIS to have a finer view on a mann, 1994. specific area. [2] G. L. Andrienko and N. V. Andrienko. Knowledge-based visualization to support spatial data mining. In IDA, pages 149–160, 1999. These results were analyzed and validated by a geologist, specialist of soil erosion in New Caledonia. They point out known correlations about soil erosion in this area. The more significant colocations are the associations between sensitive trails, mining zones, river erosion and sparse vegetation, and between mines, hillslope erosion, woodyherbaceous scrub and sensitive trails or river erosion. They highlight the environmental damage near the areas where humans have used the soils. Another example is that colocations show that plant systems can also be related to the environment degradation. The interest of this approach for the experts is to have a formal and intuitive approach to study such phenomenon, to automate the analysis and to quantify the importance of the correlations thanks to the participation index. 10 [3] A. Appice, M. Ceci, A. Lanza, F. A. Lisi, and D. Malerba. Discovery of spatial association rules in geo-referenced census data: A relational mining approach. Intell. Data Anal., 7(6):541–566, 2003. [4] V. Bogorny, J. F. Valiati, S. da Silva Camargo, P. M. Engel, B. Kuijpers, and L. O. Alvares. Mining maximal generalized frequent geographic patterns with knowledge constraints. In ICDM, pages 813–817. IEEE Computer Society, 2006. [5] C. Brunk, J. Kelly, and R. Kohavi. Mineset: An integrated system for data mining. In KDD, pages 135–138, 1997. Figure 9: Visualization of colocations on soil erosion data (threshold: 0.1, distance: 300m) [6] M. Ceci, A. Appice, and D. Malerba. Discoversets: A general approach. IEEE Trans. Knowl. ing emerging patterns in spatial databases: A Data Eng., 16(12):1472–1485, 2004. multi-relational approach. In PKDD’07, volume 4702 of LNCS, pages 390–397. Springer, [13] D. A. Keim. Information visualization and visual data mining. IEEE Trans. Vis. Comput. 2007. Graph., 8(1):1–8, 2002. [7] M. Celik, J. M. Kang, and S. Shekhar. Zonal co-location pattern discovery with dynamic [14] K. Koperski and J. Han. Discovery of spatial association rules in geographic information parameters. In IEEE ICDM’07, pages 433– databases. In M. J. Egenhofer and J. R. Her438. IEEE Computer Society, 2007. ring, editors, SSD, volume 951 of Lecture Notes [8] F. Flouvat, F. De Marchi, and J.-M. Petit. in Computer Science, pages 47–66. Springer, The izi project: easy prototyping of interesting 1995. pattern mining algorithms. In Advanced Techniques for Data Mining and Knowledge Dis- [15] C. K.-S. Leung, P. Irani, and C. L. Carmichael. Wifisviz: Effective visualization of frequent covery, LNCS, pages 1–15. Springer-Verlag, itemsets. In ICDM, pages 875–880. IEEE 2009. Computer Society, 2008. [9] F. Flouvat, N. Selmaoui-Folcher, D. Gay, I. Rouet, and C. Grison. Constrained coloca- [16] F. A. Lisi and D. Malerba. Inducing multilevel association rules from multiple relations. tion mining : application to soil erosion charMachine Learning, 55(2):175–210, 2004. acterization. In S. Y. Shin and S. Ossowski, editors, SAC. ACM, 2010. [17] D. Malerba. A relational perspective on [10] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update, volume 11. 2009. [11] J. Han and M. Kamber. Data Mining, Second Edition : Concepts and Techniques. Morgan Kaufmann, January 2006. spatial data mining. International Journal of Data Mining, Modelling and Management, 1(1):103–118, 2008. [18] H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Discov., 1(3):241–258, 1997. [12] Y. Huang, S. Shekhar, and H. Xiong. Dis- [19] D. Pelleg and A. W. Moore. X-means: Extending k-means with efficient estimation of covering colocation patterns from spatial data 11 the number of clusters. In P. Langley, editor, ICML, pages 727–734. Morgan Kaufmann, 2000. [20] S. Shekhar and Y. Huang. Discovering spatial co-location patterns: A summary of results. In C. S. Jensen, M. Schneider, B. Seeger, and V. J. Tsotras, editors, SSTD, volume 2121 of Lecture Notes in Computer Science, pages 236–256. Springer, 2001. [21] J. S. Yoo and S. Shekhar. A joinless approach for mining spatial colocation patterns. IEEE Trans. Knowl. Data Eng., 18(10):1323–1337, 2006. 12