Download A clustering-based visualization of spatial patterns

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
A clustering-based visualization of spatial patterns
Nazha Selmaoui-Folcher, Frédéric Flouvat, Elise Desmier and Dominique Gay
University of New Caledonia, PPME-ERIM, F-98851, Noumea, New Caledonia
[email protected], [email protected],
[email protected], [email protected]
March 9, 2010
Abstract
mining techniques may be applied to extract spatial
interesting patterns such as association rules [16] or
emerging patterns [6] ; see also [3, 17]. On the other
hand, [12] identified two approaches for colocation
mining: transaction-based approaches and eventbased approaches. Transaction-based approaches
focus on transforming spatial data into transactional data where classical itemset mining algorithms could be used [14, 4]. In [14], authors presented an efficient method for mining association
rules in geographic information databases. This
method enumerates neighbors to ”materialize” a
set of transactions around instances of the reference spatial feature. The goal is to find colocations
of relevant features to the reference feature. [4]
extends this work by introducing knowledge constraints in a preprocessing step. The main limit
of these works is that spatial relationships and features are only partially considered. To cope with
this limit, event-based approaches focus on the
event and their neighbor relationships [20, 12, 21].
Shekhar and al. have defined the colocation concept based on Koperski’s work. The goal is to
find all subsets of spatial features likely to occur
together. To filter interesting colocations, two interestingness measure have been proposed. Thanks
to the anti-monotonic property of these predicates,
a levelwise algorithm has been used to extract interesting colocations. Thus, this approach consider
all the features together and original data are not
transformed.
Extraction of interesting colocations in georeferenced data is one of the major tasks in spatial pattern mining. Considering a set of spatial
Boolean features, the goal is to find relevant subsets of features associated with objects often located together. In this context, the main drawback
is the interpretation of extracted patterns by domain experts. Indeed, common textual representation of colocations loses important spatial information. To overcome this problem, we propose a new
clustering-based visualization technique deeply integrated in the colocation algorithm. This new simple, concise and intuitive cartographic representation consider both spatial information and experts
practice. The whole process has been experimented
on a real-world geological data set and the addedvalue of the method confirmed by domain experts.
1
Introduction
Spatial data mining refers to the extraction of interesting, useful, unexpected and implicit knowledge
in spatial data. It has wide applications in environmental management, public safety, transportation
or tourism. One of the classical task in spatial pattern mining is the extraction of interesting colocations in geo-referenced data [14, 20, 3, 12, 16, 4, 21,
6, 7, 17]. To deal with this problem, two families
of spatial pattern mining approaches may be identified : multi-relational approaches and colocationHowever, a major problem with these spatial patbased approaches. When spatial data is made of tern mining techniques is the interpretation of the
various tables describing objects and spatial re- results by domain experts. Actually, extracted patlationships between objects, multi-relational data terns are presented in a textual form, which is not
1
a representation that can be easily understood and
directly usable by experts. Moreover, a textual representation considers only partially the spatial informations of the underlying objects. Indeed, experts can only know which features are generally
located together, but they don’t have any informations on where these colocations are generally
located and their configuration. In this context, we
propose a new visualization of colocations based on
clustering. This solution leads to a simple, concise
and intuitive cartographic visualization of colocations, and takes into consideration the spatial nature of the underlying objects and the experts practice. Finally, this proposition has been integrated
in a prototype with a Geographic Information Systems (GIS). Experiments have been done on a real
geological dataset and validated by a domain expert.
Section 2 presents related works on the interpretation and the visualization of data mining results.
Section 3 presents the colocation mining problem.
Section 4 introduces our work to deliver actionable
knowledge to domain experts, i.e. a new spatial
representation of colocations. Section 5 presents
some experiments on a real geological dataset. Finally, section 6 concludes and gives some perspectives.
2
Figure 1: a) A Rule visualizer view of supermarket items. b) Visualization of an association rule
for South America. c)A snapshot of the proposed
WiFIsViz.
the type of result to visualize. Each mining algorithm (such as simple-Bayes model, decision tree,
or association rules) is coupled with a visualization
tool in order to help users in their interpretation of
the learned models. Figure 1-a shows the visualization of association rules.
Recently, [15] deals with the visualization of frequent itemsets. The authors developed a system,
WiFIsViz, for visualizing frequent itemsets based
on orthogonal graphs (wiring-type diagrams). Frequent itemsets are shown in a two-dimensional
space, where the x-axis shows items and the y-axis
shows the frequencies (figure 1-c). An itemset X is
represented by a horizontal line connecting nodes,
where each node represents an item of X. Moreover, itemsets sharing the same prefix are merged,
which improves the visualization. The visualizer
provides different levels of details to represent frequent itemsets. It also integrates features for constrained itemset mining.
For spatial data, a typical system is the one proposed in[2]. Authors were interested in representing spatial data and providing a visualization of
classical data mining results on spatial data. The
Visualization in data mining: related works
One of the major issues in data mining is the representation of the discovered knowledge such as it can
be easily understood and directly usable by experts
[11]. Nevertheless, most of data mining methods
return results in a textual form based on an interestingness measure. To our knowledge, no solutions
have been proposed specifically for the visualization
of colocation patterns. However, several visualization systems have been proposed for classical data
mining tasks or for spatial data. In the rest of this
section, we describe the main approaches.
For classical data mining, several systems have
been developed to represent raw data or mining
results [5, 13]. For example, MineSet [5] is an interactive system for data mining integrating data
visualization. Different kinds of visualizer (statistics, scatter, map, tree) are available according to
2
clidean distance with a threshold, two spatial objects are neighbors if their distance is lower than
the given threshold.
A colocation instance I ⊆ O of a colocation
C is a set of objects, such that the objects are instances of all the features of C and form a clique
relationship w.r.t. R. As a consequence, a colocation instance of a colocation C satisfies the following
property:
• |{f ∈ C | o ∈ I and Θ(o) = f }| = |C|
presentation of subgroups or clusters is naturally
presented on maps using painting or icons on spatial objects. The same technique is applied to decision trees and classification rules by associating a
visual feature to objects. For non-geographical informations such as mined trees or rules, the system
makes a dynamic link between the map and reports.
For example, when a cursor is positioned on a tree
node or a rule in a report, the corresponding instances are highlighted in the map (and vice versa).
Figure1-b illustrates an application of this system
for the visualization of rules for South America.
As far as we know, none of the solutions proposed
in the literature were designed to display spatial
patterns in a simple, concise and intuitive way for
experts. They do not take into consideration the
spatial nature of the underlying objects, and only
provide non spatial knowledge.
3
The
colocation
framework
• |I| = |C|
• ∀o, p ∈ I, R(o, p) = true
The figure 2 shows that the set of objects
{A9 , B4 , D10 } is a colocation instance of the colocation {A, B, D} w.r.t. to a fixed Euclidean distance
threshold (represented by dotted circles). To the
opposite, {A1 , B4 , C7 }, {A1 , B4 , D10 } or {A9 , B4 }
are not colocation instances of {A, B, D}. However, not every colocation is interesting. There is
only one set of three neighbor objects having the
features A, B and D. Thus, we need other concepts
to determine the interestingness of a colocation. In
this paper, to simplify, we use the term ”instance”
to refer to a ”colocation instance”, and represent
the colocation {A, B, D} by ABD in the figures.
A table instance of a colocation C, denoted
T IC , is the set of all its colocation instances.
The table instance of {A, B, C} is T I{A,B,C} =
{{A1 , B8 , C7 }, {A5 , B6 , C2 }} and the table instance
of {B, D} is T I{B,D} = {{B4 , D10 }} (see figure 2).
More formally, we have
mining
This section recall the colocation framework proposed in [20, 12, 21]. Let F be a set of boolean
features, O be a set of spatial objects, and R be
a neighbor relationship over O. An instance of a
feature f ∈ F is an object of O having the feature
f . We define the function Θ : O → F to formally
define the association between objects and features.
For example, in figure 2, F = {A, B, C, D, E},
O = {A1 , C2 , B3 ..., E12 }, and A9 is an instance of
feature A, i.e. Θ(A9 ) = A. Note that, in this paper, spatial objects are represented by points in a
two dimensional space.
T IC = {I ⊆ O | I is an instance of C w.r.t. R}
The participation ratio pr(C, f ) for a feature
f of a colocation C, is the fraction of objects of a
feature f included in the instances of C, to the total
number of objects of a feature f .
pr(C, f ) =
|{o ∈ I | I ∈ T IC and Θ(o) = f }|
|T I{f } |
In
figure
2,
pr({A, B, C}, A)
=
2/3, pr({A, B, C}, B)
=
1/2
and
pr({A, B, C}, C) = 1.
Based on the definitions above, [12] has proA colocation C ⊆ F is a set of features, whose posed the concept of participation index, deinstances form a clique using a neighbor relation- noted pi(C), to estimate the frequency of colocaship R. If the neighbor relationship R is the Eu- tion C. More precisely, it represents the minimal
Figure 2: Spatial objects and their features
3
probability to have an object in an instance of the
colocation C w.r.t. all objects having this feature.
pi(C) = min ( pr(C, f ))
∀f ∈C
Based on these definitions, the problem to solve
is :
Colocation mining problem. Given F a set
of features, O a set of spatial objects, R a neighbor
relationship and α ∈ [0, 1] a threshold. The problem
is to find the set of colocations {C ⊆ F | pi(C) ≥ α}
4
A spatial visualization of
colocations integrated in a
GIS
The visualization of data mining results is essential
to have actionable domain knowledge. In domains
manipulating geographical data, GIS are classical
tools for storing and visualizing spatial data. A
main characteristic of GIS is the cartographic visualization of the information in thematic layers. In
this context, our objective is to find a spatial visualization of the colocation mining results integrated
in the GIS. However, the potential high number
of colocation instances may lead to an unreadable
map, and colocations in a textual form loses the
spatial informations of their objects.
To deal with these problems, we propose a new
cartographic visualization of colocations in a GIS.
Figure 3: Principle of our approach
The principle of our approach (figure 3) is two-step
:
a) extract colocations patterns using classical
colocation mining algorithm
two spatial representations, which shows that this
colocation is generally located in the center and in
b) use the table instance of each colocation C to
the north-east of the area.
construct spatial representations of C
Note that our visualization approach also inteThese spatial representations allow to see where grates thematic aspects by painting each feature in
and how the colocation is generally located. Basi- the color of the corresponding theme. In the same
cally, a spatial representation of a colocation way, the intensity of the links color is proportional
C is a set of points, each one representing a fea- to the value of the participation index associated
ture of C, and linked together by lines. The lines to the colocation.
between the points represent the neighbor relationship. In other words, a spatial representation of 4.1 A first cartographic representaa colocation is a clique spatially positioned on a
tion of colocations
map. The position of each point of the clique depends on the position of the colocation instances. Firstly, we consider a very simple approach to conFor example, on figure 3, the colocation {E, C} has struct spatial representations of colocations. It con4
4.2
sists in constructing, for each feature f of a colocation C, the centroid of the objects of feature f
included in an instance of C (figure 4). In other
words, for each feature f ∈ C, the visual representation of f is a spatial object of such that:
P
∀o=(x,y)∈Ωf,C x
,
of = (xf , yf ), with xf =
|Ωf,C |
P
∀o=(x,y)∈Ωf,C y
yf =
|Ωf,C |
and Θ(of ) = f
A clustering-based spatial representation of colocations
When instances of a colocation are not spatially located in a single location, there should be several
spatial representations for such colocation to represent these different spatial distributions. In this
context, our proposition is to combine a clustering
method with the colocation mining algorithm (figure 5). More precisely, instead of processing centroids based on the whole table instance of a colocation C, we partition this table instance in several
clusters based on their spatial coordinates (figure 5
step 1.clustering). Then, each partition (representing a typical location of instances of C) is used to
construct a spatial representation of C based on the
centroids method described in the previous subsection (figure 5 step 2.centroids).
The algorithm 1 illustrates the details of our
method. The main part of the algorithm corresponds to the levelwise colocation mining algorithm
proposed in [12], only lines 2-4 and 9-11 correspond
to the construction of the spatial clustering-based
colocation patterns.
where Ωf,C = {o ∈ I | I ∈ T IC and Θ(o) = f }
As a consequence, each feature f of a colocation
C is represented by a single spatial object (i.e. a
point) in the map. This object corresponds to the
”average” location of feature f in the instances of
C. Thus, each colocation is represented by a single clique corresponding to the ”average” location
and configuration of its instances. Figure 4 illustrates the construction of the spatial representations based on the centroid approach (step b of our
approach illustrated in figure 3).
However, this method leads to an interpretation problem when the spatial representation of the
colocation is located in the middle of the studied
area. Indeed, instances of such colocations can be
located either in the middle of the area or uniformly
distributed all over the area. Moreover, in practice, instances of a colocation are rarely grouped
in a single location. Instead, they may be several
locations where the colocation frequently appears.
In such cases, this method will construct an ”average” spatial representation which is not necessarily
meaningful for the expert. Figures 3 and 4 illustrate this problem with colocation {E, C}. In figure 3 (top figure), instances of this colocation are
frequently located in the central region and in the
north-east region. In figure 4 (right figure), its spatial representation using the previous method is located between these two regions which can be misinterpreted by experts. Nonetheless, note that this
spatial representation may give an interesting information to experts: such relation generally doesn’t
occur in the south or in the east of the studied area.
A solution to deal with this problem is to use
clustering in order to have several spatial representations of a colocation w.r.t. the locations of its
objects, and thus to have a finer interpretation of
the spatial distribution of colocations.
Algorithm 1 Spatial clustering-based colocation
mining algorithm
Require: a set of spatial objects O, a set of features F ,
a boolean spatial relationship R, the participation index
threshold α
Ensure: the spatial representations of interesting colocations
1: Cand1 = F ; k = 1
2: for all f ∈ F do
3:
CFf = clusterObjectsFeature(O, f )
4: end for
5: while Candk 6= ∅ do
6:
for all C ∈ Candk do
7:
T IC = generateT ableInstance(O, C)
8:
if pi(C) ≥ α then
S
CFf )
9:
for all cluster ∈ clusterTIColoc(TIC ,
∀f ∈C
do
10:
Spatial ColocC = generateCentroidsColoc(cluster, C)
11:
end for
12:
Interest Colock = Interest Colock ∪ {C}
13:
end if
14:
end for
15:
Candk+1
= {X
⊆S F
|
∀ Y
⊂
X,
Y ∈ Interest Colock }\
Candj
j≤k
16:
k =k+1
17: end while
S
18: Return ∀C∈S
0<i<k
Interest Coloci
Spatial ColocC
The levelwise strategy proposed in [20] for colocation mining is based on the classical Apriori algorithm [1]. Note that a generalization of the Apriori
algorithm is also described in [18]. The principle
5
Figure 4: Simple representation using centroids of table instance objects
Figure 5: Clustering-based representation using table instance objects
2 to 4), run once at the beginning of the algorithm. Let CFf be the set of clusters obtained
with the objects of feature f .
of this strategy is to iteratively generate a set of
candidate colocations of size k + 1 (i.e. colocations having k+1 features), denoted Candk+1 , from
the set of interesting colocations of size k, denoted
Interest Colock , and to test their corresponding
participation index. Thus, this approach alternates
candidate generation and evaluation phases. The
candidate generation is done in line 15 based on interesting collocations of size k. For each candidate
colocation generated, the evaluation phases is done
in line 8, using the table instance processed in line
7.
For the construction of the spatial representations, the most simple solution would have been
to execute a clustering algorithm on the table instance of each colocation. However, this solution
would have been time consuming, considering you
may have thousands of colocations. Therefore, we
develop a two-step clustering approach integrated
in the mining algorithm. The two steps are:
• a clustering of each colocation table instance
based on the clusters of each feature, using a
merge and split approach (line 9).
First, for each feature f , we partition the objects having feature f based on their coordinates
(line 2 to 4), using the X-means clustering algorithm [19] implemented in Weka [10]. Then, we use
these clusters of objects as a basis for the clustering of each table instance of an interesting colocation C (function clusterT IColoc, line 9). Finally,
for each cluster of instances generated, the function generateCentroidsColoc (line 10) constructs
the corresponding spatial representation of C based
on the centroids of the objects of each feature. This
approach is illustrated for one interesting colocation in the example of figure 6.
More precisely, the function clusterT IColoc processes the table instance using a merge and split
• a clustering of the objects of each feature (line
6
Figure 6: Example of construction of the visual representations of colocation {X,Y,Z} using the merge
and split approach
would be in different clusters, whereas they share
the object Y2 .
To avoid this problem, our solution is to iteratively merge the clusters of the feature having the
highest number of clusters, and finally split the
table instance w.r.t. these clusters when nothing
can be merged anymore (stability condition). This
method is illustrated in figure 6. Given two features
f and g for a colocation C, we have two situations:
approach. The principle of this method is to select
the feature f having the highest number of clusters,
and to split the instances of C w.r.t. to the clusters of f . However, two instances in two different
partitions w.r.t clusters of f can have in common
an object of an other feature of C. In those cases,
we have conflictual clusters, i.e. objects belonging
to several partitions. For example, in figure 6-a,
if we partition colocation instances w.r.t. clusters
of Z, the second and third instances of {X, Y, Z}
• suppose that two instances are in different par7
titions w.r.t. clusters of f , but have in common
an object of g. We merge the two clusters of f
leading to such partitioning (as a consequence
these clusters will not be conflictual anymore).
For example, in figure 6-c, the second and third
instances belong to different partitioning w.r.t.
clusters of X, but they have in common the
object Y2 . Consequently, the two conflicting
clusters of X are merged (figure 6-d).
side) shows that colocations {A, B, C} is generally
located in the north west of the map, {A, B, E, F }
in the south-west and {A, B, D} in the south-east.
Thus, our approach has the advantage to provide
to experts a global picture of the spatial distribution of the colocations. Using a classical visualization approach, it would have been more difficult to
have such informations. Moreover, our approach
also allows to visualize with precision how the features of a colocation are w.r.t. to each others. For
example, figure 3 shows that the instances of colocation {A, B, D} are generally closer than the ones
of colocation {A, B, C}. In the same way, the spatial representation of {A, B, D} shows that objects
having feature B are generally below the ones having features A and D, and objects having feature
D are generally located on the left of the ones having feature A. Furthermore, note that experts can
easily visualize the importance of a colocation and
its themes thanks to the color system.
Finally, this visualization approach do not require additional post processing step, since it is
done during the mining algorithm using table instances processed for colocation mining.
• suppose that two instances are in different partitions w.r.t. clusters of f , but have objects belonging to the same cluster of g. We split the
corresponding cluster of g. For example (figure 6-e), the fourth and fifth instances belong
to different partitioning w.r.t. clusters of Y ,
but they include the objects X4 and X5 which
are in the same cluster of X. Consequently,
we split the corresponding cluster of X w.r.t.
clusters of Y (figure 6-f).
Note that the interpretation may be difficult if
lot of spatial patterns are generated. The zoom
functionality of the GIS partially solves this problem, but in some cases it may not be enough. To
deal with this problem, the user can choose to extract a condensed representation of the interesting
colocations, i.e. a subset of colocations representing the solutions. Thus, our system also proposes
the extraction and visualization of maximal interesting colocations w.r.t. set inclusion (also called
the positive border in [18]), instead of all interesting colocations (see [18, 9] for more details).
4.3
5
Application
The proposals discussed in this paper have been
integrated in a prototype coupled with a GIS (figure 7). This prototype is based on a data mining
tool called iZi [8]. This tool is used to solve interesting pattern mining problems as defined in the
formal framework of [18], by providing generic algorithm implementations. This tool has been extended to process spatial clustering-based colocations patterns and to store them in a PostGis geographical database. Quantum GIS (a free desktop
application framework) is used as an interface to
visualize data and colocations stored in the GIS.
We used our prototype to study soil erosion on
a mountainous area of 9km2 in New Caledonia. In
this area, natural erosion takes place as well as erosion related to mining activities. When studying
soil erosion, three important thematic layers were
considered: soil erosion (6 features), nature of the
ground (13 features), and vegetation (13 features).
This dataset is composed of more than 9000 objects. The studied objects resulted from vector data
of a geographical database. The spatial relation-
Advantages of our proposition
This visualization approach has three main advantages w.r.t. existing solutions. Firstly, we get a spatial visualization of colocations totally integrated
in the GIS, and thus adapted to experts needs and
practices. The original data is not affected by our
approach, only an additional layer is added. Moreover, it can take advantage of the GIS functionalities. For example, the user can zoom on the map in
order to have either a general view of all the colocations (figure 9, in the middle), or a detailed view
of one or several colocations (figure 9 on the right).
Secondly, this representation gives additional informations on the colocations. Actually, it allows to
visualize where and how an interesting colocation
is spatially located. For example, figure 5 (right
8
Minimum distance 200m
V0
Centroids
Clustering
Total Time (sec)
10000
1000
100
0.5
0.4
0.3
0.2
0.1
0
Minimum participation index
Minimum distance 300m
Total Time (sec)
10000
Figure 7: Architecture of the prototype
V0
Centroids
Clustering
1000
ship studied was a neighbor relationship based on
a distance threshold between the centroids of the
100
0.5 0.4 0.3 0.2 0.1
0
areas.
Minimum participation index
Figure 8 shows the performances of colocation
mining (V 0), colocation mining with the centroid
visualization (Centroids) and colocations mining Figure 8: Performances of the different approaches
with clustering-based visualization (Clustering).
As shown by this figure, performances are acceptable for experts (same order of magnitude) w.r.t.
the value-added informations provided, especially if
we take into consideration that such data is rarely
updated. Actually, most of the additional processDistance
Participation index threshold
0.5
0.3
0.1
ing time is due to the non-optimized implementa200m
nb colocations
21
68
266
tion of our prototype. Indeed, in this first work, we
avg nb instances
16 478
11 974
8 365
for a colocation
focus more on results to demonstrate the interest
total nb instances
346 046
814 263
2 225 118
of this approach, than on performances. For exfor all colocations
nb spatial
31
112
510
ample, the top plot (minimum participation index
representations
equal to 0.5) shows the cost of the weka invocation
300m
nb colocations
55
163
711
avg nb instances
50 803
78 347
87 100
for the first clustering on features. Indeed, part of
for a colocation
the runtime is due to external calls to Weka using
total nb instances
2 794 205
12 770 670
61 928 727
for all colocations
intermediate files. In the bottom plot, the differnb spatial
84
258
1349
ence between V 0 and Centroids shows that most
representations
of the processing time is not due to the clustering
steps, but to the data access in the GIS. Actually,
Table 1: Number of colocations and spatial
SQL queries and database parameters are not opclustering-based patterns
timized in this version of our prototype.
Table 1 shows the number of colocation for different distance and participation index thresholds,
9
and the corresponding number of spatial clusteringbased colocation patterns. In average, the number of spatial representations is no more than twice
the number of colocations. The average number
of instances for a colocation represents the number
of patterns that would have been displayed in the
map using a classical visualization approach such
as in [2], i.e. selection of a colocation in a report
and display of the corresponding instances on the
map. The total number of instances for all colocations represents the number of patterns on the
map if we display all the instances of all interesting
colocations at the same time. These two indicators illustrate the interest of our approach, since
the number of patterns displayed using our solution is much lower than the two others.
6
Conclusion
In this paper, we propose a clustering-based
method for the visualization of colocation patterns. The visualization method extends the colocation concept with spatial informations and is
deeply integrated in the colocation mining algorithm. Moreover, the cartographic representation
of these patterns better fits with experts practice.
The whole process has been successfully integrated
in a prototype based on PostGIS. To our knowledge, existing visualization approaches does not
have these advantages. Finally, we validated our
method through experiments on a real-world geological dataset. The analysis of experimental results by domain experts has confirmed the addedvalue of the method.
Acknowledgments. The authors wish to thank
The visualization of the spatial clustering-based
Isabelle Rouet, geologist and expert in soil erosion,
representations for one of these experiments is prefor providing the data and validating the results.
sented in figure 9. We can see the spatial objects (left screenshot), their corresponding spatial
clustering-based colocations (screenshot in the cenReferences
ter), and a zoom on a specific area (right screenshot). This figure illustrates the advantage of our [1] R. Agrawal and R. Srikant. Fast algorithms for
approach by providing to experts a global picture
mining association rules in large databases. In
on where and how the colocations are generally loJ. B. Bocca, M. Jarke, and C. Zaniolo, edcated. It also shows how experts can use the zoom
itors, VLDB, pages 487–499. Morgan Kauffunctionality of the GIS to have a finer view on a
mann, 1994.
specific area.
[2] G. L. Andrienko and N. V. Andrienko.
Knowledge-based visualization to support spatial data mining. In IDA, pages 149–160, 1999.
These results were analyzed and validated by a
geologist, specialist of soil erosion in New Caledonia. They point out known correlations about soil
erosion in this area. The more significant colocations are the associations between sensitive trails,
mining zones, river erosion and sparse vegetation, and between mines, hillslope erosion, woodyherbaceous scrub and sensitive trails or river erosion. They highlight the environmental damage
near the areas where humans have used the soils.
Another example is that colocations show that
plant systems can also be related to the environment degradation. The interest of this approach
for the experts is to have a formal and intuitive
approach to study such phenomenon, to automate
the analysis and to quantify the importance of the
correlations thanks to the participation index.
10
[3] A. Appice, M. Ceci, A. Lanza, F. A. Lisi, and
D. Malerba. Discovery of spatial association
rules in geo-referenced census data: A relational mining approach. Intell. Data Anal.,
7(6):541–566, 2003.
[4] V. Bogorny, J. F. Valiati, S. da Silva Camargo, P. M. Engel, B. Kuijpers, and L. O.
Alvares. Mining maximal generalized frequent geographic patterns with knowledge
constraints. In ICDM, pages 813–817. IEEE
Computer Society, 2006.
[5] C. Brunk, J. Kelly, and R. Kohavi. Mineset: An integrated system for data mining. In
KDD, pages 135–138, 1997.
Figure 9: Visualization of colocations on soil erosion data (threshold: 0.1, distance: 300m)
[6] M. Ceci, A. Appice, and D. Malerba. Discoversets: A general approach. IEEE Trans. Knowl.
ing emerging patterns in spatial databases: A
Data Eng., 16(12):1472–1485, 2004.
multi-relational approach. In PKDD’07, volume 4702 of LNCS, pages 390–397. Springer, [13] D. A. Keim. Information visualization and visual data mining. IEEE Trans. Vis. Comput.
2007.
Graph., 8(1):1–8, 2002.
[7] M. Celik, J. M. Kang, and S. Shekhar. Zonal
co-location pattern discovery with dynamic [14] K. Koperski and J. Han. Discovery of spatial association rules in geographic information
parameters. In IEEE ICDM’07, pages 433–
databases. In M. J. Egenhofer and J. R. Her438. IEEE Computer Society, 2007.
ring, editors, SSD, volume 951 of Lecture Notes
[8] F. Flouvat, F. De Marchi, and J.-M. Petit.
in Computer Science, pages 47–66. Springer,
The izi project: easy prototyping of interesting
1995.
pattern mining algorithms. In Advanced Techniques for Data Mining and Knowledge Dis- [15] C. K.-S. Leung, P. Irani, and C. L. Carmichael.
Wifisviz: Effective visualization of frequent
covery, LNCS, pages 1–15. Springer-Verlag,
itemsets. In ICDM, pages 875–880. IEEE
2009.
Computer Society, 2008.
[9] F. Flouvat, N. Selmaoui-Folcher, D. Gay,
I. Rouet, and C. Grison. Constrained coloca- [16] F. A. Lisi and D. Malerba. Inducing multilevel association rules from multiple relations.
tion mining : application to soil erosion charMachine Learning, 55(2):175–210, 2004.
acterization. In S. Y. Shin and S. Ossowski,
editors, SAC. ACM, 2010.
[17] D. Malerba.
A relational perspective on
[10] M. Hall, E. Frank, G. Holmes, B. Pfahringer,
P. Reutemann, and I. H. Witten. The WEKA
Data Mining Software: An Update, volume 11.
2009.
[11] J. Han and M. Kamber. Data Mining, Second
Edition : Concepts and Techniques. Morgan
Kaufmann, January 2006.
spatial data mining. International Journal
of Data Mining, Modelling and Management,
1(1):103–118, 2008.
[18] H. Mannila and H. Toivonen. Levelwise search
and borders of theories in knowledge discovery. Data Min. Knowl. Discov., 1(3):241–258,
1997.
[12] Y. Huang, S. Shekhar, and H. Xiong. Dis- [19] D. Pelleg and A. W. Moore. X-means: Extending k-means with efficient estimation of
covering colocation patterns from spatial data
11
the number of clusters. In P. Langley, editor, ICML, pages 727–734. Morgan Kaufmann,
2000.
[20] S. Shekhar and Y. Huang. Discovering spatial co-location patterns: A summary of results. In C. S. Jensen, M. Schneider, B. Seeger,
and V. J. Tsotras, editors, SSTD, volume 2121
of Lecture Notes in Computer Science, pages
236–256. Springer, 2001.
[21] J. S. Yoo and S. Shekhar. A joinless approach
for mining spatial colocation patterns. IEEE
Trans. Knowl. Data Eng., 18(10):1323–1337,
2006.
12