Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EasySDM: A Spatial Data Mining Platform (User Manual) Authors: Amine Abdaoui and Mohamed Ala Al Chikha, Students at the National Computing Engineering School. Algiers. June 2013. 1. Overview EasySDM is an open source spatial data mining platform developed at the national computing engineering school by Abdaoui and Alchikha students at their final graduation year. This first release integrates three main tasks of spatial data mining (clustering, classification and association rules) including nine algorithms. Seven algorithms are based on spatial preprocessing before applying classical data mining techniques and the two remaining algorithms deals directly with spatial data. The main contributions of this work are: - Integrating in the same platform algorithms based on the two main spatial data mining approaches. - Ensuring an internal cartographic visualization of geographic results. - A possibility of external visualization and manipulation of geographic results via any Geographic Information System. - Offering the extraction of two spatial relationships (distance and topology). 2. Installation Before installing EasySDM, the user needs to install PostgreSQL 9.2 and PostGIS 2.0. Then, the installation of EasySDM is simple and intuitive; it just needs the installation folder and some usual options as shown on figure 1. Note that EasySDM must be run as an administrator. Therefore, the user needs to add the administrator rights from the properties of the application in the bin directory. Figure 1: The installer of easySDM. 3. Lunch EasySDM Once EasySDM is correctly installed, the user can lunch it from the new icon on the desktop (if it exists) or from the installation folder by clicking on “EasySDM.exe”. At the first time, the user needs to specify the connection settings to the postGIS database: Figure 2: Settings for the PostGIS database connection. The user can either use an existing database or create a new database. This settings configuration will be saved and used every time the user lunches the application. 4. Menu overview The main interface’s menu is composed of five parts: Figure 3: The main interface. - The geographic preprocessing, where the spatial data are prepared for the classical data mining algorithms. - The clustering algorithms. - The classification algorithms. - One association rules algorithm. - And finally, the settings of easySDM. 5. Geographic preprocessing The geographic preprocessing prepares spatial data to the classical data mining methods by adding new attributes describing the relations between the different geographic objects. This preparation can be performed on spatial data contained in a Shape file, or in a database table (PostGIS 2.0). Figure 4: The geographic preprocessing. Depending on the case, the user must select either the Shape file, or the database table and then, chooses the type of spatial relations to use (two types of relation are implemented; topological and metric relations). Figure 5: The geographic preprocessing performed on a Shape file, and on a Postgis database table. The results of this preprocessing are saved as a new Arff file. 6. Clustering The clustering algorithms implemented in easySDM are basically classical data mining algorithms, which need the geographically pre-processed spatial data and the associated Shape file. EasySDM offers four categories of clustering; the partitioning, the density, the hierarchical and finally the regionalization clustering. Figure 6: Clustering menu. The results of every algorithm execution are shown geographically using the map implemented, and textually using the dialog area. 6.1. Kmeans interface The user can perform the partitioning clustering algorithm Kmeans via this interface, it takes as parameters the number of cluster to generate, the distance function (the user can choose between two possible options; the Euclidean or the Manhattan distance) and whether to replace the missing values or not. It is important to note that the default distance function is the Euclidean distance. Figure 7: Kmeans interface. 6.2. EM interface The user can perform the partitioning clustering algorithm EM via this interface, it takes as parameters the number of cluster to generate and whether to replace the missing values or not. Figure 8: EM interface. 6.3. Farthest First interface The user can perform the partitioning clustering algorithm Farthest First via this interface, it takes as parameters the number of cluster to generate and whether to replace the missing values or not. Figure 9: Farthest First interface. 6.4. DBscan interface The user can perform the density clustering algorithm DBscan via this interface, it takes as parameters the minimum number of points within every generated cluster), the minimum distance between two points of the same cluster, the distance function (the user can choose between two possible options; the Euclidean or the Manhattan distance and whether to replace the missing values or not. It is important to note that the default distance function is the Euclidean distance. Figure 10: DBscan interface. 6.5. Cobweb interface The user can perform the hierarchical clustering algorithm Cobweb via this interface, it takes as parameters the minimum deviation value within every generated cluster, the minimum value of the CU function and whether to replace the missing values or not. Figure 11: Cobweb interface. 6.6. Regionalization interface The user can perform the regionalization algorithm via this interface, it takes as parameters the minimum number of cluster to generate and whether to replace the missing values or not. Figure 12: Regionalization interface. 7. Association rules EasySDM offers one association rules algorithm named Apriori Spatial, it can be considered as a variant of the classical Apriori, this algorithm takes as parameters the minimum support, and the minimum confidence. Apriori Spatial is a dynamic spatial data mining algorithm that can be launched directly on a Shape file. Figure 13: Apriori Spatial interface. 8. Classification The classification algorithms implemented in easySDM are classical data mining algorithms, which need the geographically pre-processed spatial data and the associated Shape file. EasySDM offers two categories of classification: Decision trees and Bayes algorithms. Figure 14: Classification menu The results of every algorithm execution are shown geographically using the map implemented, and textually using the dialog area. 8.1. J48 interface The user can perform the decision trees classification algorithm J48 via this interface; it is separated into two steps. In the first step, the user constructs the decision tree using an already clustered data. In the second step, the user uses the decision tree to classify a new instance. Figure 15: J48 interface 8.2. Naïve Bayes The user can perform the Bayesian classification algorithm Naïve Bayes via this interface; it is separated into two steps. In the first step, the user constructs the Bayesian model using an already clustered data. In the second step, the user uses the model to classify a new instance. Figure 16: Naïve Bayes interface 9. Settings Finally, EasySDM allows the user to change the database connection settings and the interfaced Geographic Information System using the settings’ menu. Figure 17: The settings' menu