Download EasySDM: A Spatial Data Mining Platform

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
EasySDM: A Spatial Data
Mining Platform
(User Manual)
Authors:
Amine Abdaoui and Mohamed Ala Al Chikha,
Students at the National Computing Engineering School.
Algiers.
June 2013.
1. Overview
EasySDM is an open source spatial data mining platform developed at the
national computing engineering school by Abdaoui and Alchikha students at
their final graduation year.
This first release integrates three main tasks of spatial data mining
(clustering, classification and association rules) including nine algorithms. Seven
algorithms are based on spatial preprocessing before applying classical data
mining techniques and the two remaining algorithms deals directly with spatial
data.
The main contributions of this work are:
-
Integrating in the same platform algorithms based on the two main spatial
data mining approaches.
-
Ensuring an internal cartographic visualization of geographic results.
-
A possibility of external visualization and manipulation of geographic
results via any Geographic Information System.
-
Offering the extraction of two spatial relationships (distance and topology).
2. Installation
Before installing EasySDM, the user needs to install PostgreSQL 9.2 and
PostGIS 2.0. Then, the installation of EasySDM is simple and intuitive; it just
needs the installation folder and some usual options as shown on figure 1.
Note that EasySDM must be run as an administrator. Therefore, the user
needs to add the administrator rights from the properties of the application in
the bin directory.
Figure 1: The installer of easySDM.
3. Lunch EasySDM
Once EasySDM is correctly installed, the user can lunch it from the new icon
on the desktop (if it exists) or from the installation folder by clicking on
“EasySDM.exe”. At the first time, the user needs to specify the connection
settings to the postGIS database:
Figure 2: Settings for the PostGIS database connection.
The user can either use an existing database or create a new database. This
settings configuration will be saved and used every time the user lunches the
application.
4. Menu overview
The main interface’s menu is composed of five parts:
Figure 3: The main interface.
-
The geographic preprocessing, where the spatial data are prepared for the
classical data mining algorithms.
-
The clustering algorithms.
-
The classification algorithms.
-
One association rules algorithm.
-
And finally, the settings of easySDM.
5. Geographic preprocessing
The geographic preprocessing prepares spatial data to the classical data
mining methods by adding new attributes describing the relations between the
different geographic objects.
This preparation can be performed on spatial data contained in a Shape file,
or in a database table (PostGIS 2.0).
Figure 4: The geographic preprocessing.
Depending on the case, the user must select either the Shape file, or the
database table and then, chooses the type of spatial relations to use (two types of
relation are implemented; topological and metric relations).
Figure 5: The geographic preprocessing performed on a Shape file, and on a Postgis database
table.
The results of this preprocessing are saved as a new Arff file.
6. Clustering
The clustering algorithms implemented in easySDM are basically classical
data mining algorithms, which need the geographically pre-processed spatial
data and the associated Shape file.
EasySDM offers four categories of clustering; the partitioning, the density, the
hierarchical and finally the regionalization clustering.
Figure 6: Clustering menu.
The results of every algorithm execution are shown geographically using the
map implemented, and textually using the dialog area.
6.1. Kmeans interface
The user can perform the partitioning clustering algorithm Kmeans via this
interface, it takes as parameters the number of cluster to generate, the distance
function (the user can choose between two possible options; the Euclidean or the
Manhattan distance) and whether to replace the missing values or not.
It is important to note that the default distance function is the Euclidean
distance.
Figure 7: Kmeans interface.
6.2. EM interface
The user can perform the partitioning clustering algorithm EM via this
interface, it takes as parameters the number of cluster to generate and whether
to replace the missing values or not.
Figure 8: EM interface.
6.3. Farthest First interface
The user can perform the partitioning clustering algorithm Farthest First via
this interface, it takes as parameters the number of cluster to generate and
whether to replace the missing values or not.
Figure 9: Farthest First interface.
6.4. DBscan interface
The user can perform the density clustering algorithm DBscan via this
interface, it takes as parameters the minimum number of points within every
generated cluster), the minimum distance between two points of the same
cluster, the distance function (the user can choose between two possible options;
the Euclidean or the Manhattan distance and whether to replace the missing
values or not.
It is important to note that the default distance function is the Euclidean
distance.
Figure 10: DBscan interface.
6.5. Cobweb interface
The user can perform the hierarchical clustering algorithm Cobweb via this
interface, it takes as parameters the minimum deviation value within every
generated cluster, the minimum value of the CU function and whether to replace
the missing values or not.
Figure 11: Cobweb interface.
6.6. Regionalization interface
The user can perform the regionalization algorithm via this interface, it takes
as parameters the minimum number of cluster to generate and whether to
replace the missing values or not.
Figure 12: Regionalization interface.
7. Association rules
EasySDM offers one association rules algorithm named Apriori Spatial, it can
be considered as a variant of the classical Apriori, this algorithm takes as
parameters the minimum support, and the minimum confidence.
Apriori Spatial is a dynamic spatial data mining algorithm that can be
launched directly on a Shape file.
Figure 13: Apriori Spatial interface.
8. Classification
The classification algorithms implemented in easySDM are classical data
mining algorithms, which need the geographically pre-processed spatial data and
the associated Shape file.
EasySDM offers two categories of classification: Decision trees and Bayes
algorithms.
Figure 14: Classification menu
The results of every algorithm execution are shown geographically using the
map implemented, and textually using the dialog area.
8.1. J48 interface
The user can perform the decision trees classification algorithm J48 via this
interface; it is separated into two steps. In the first step, the user constructs the
decision tree using an already clustered data. In the second step, the user uses
the decision tree to classify a new instance.
Figure 15: J48 interface
8.2. Naïve Bayes
The user can perform the Bayesian classification algorithm Naïve Bayes via
this interface; it is separated into two steps. In the first step, the user constructs
the Bayesian model using an already clustered data. In the second step, the user
uses the model to classify a new instance.
Figure 16: Naïve Bayes interface
9. Settings
Finally, EasySDM allows the user to change the database connection settings
and the interfaced Geographic Information System using the settings’ menu.
Figure 17: The settings' menu