Download Using formal ontology for integrated spatial data mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Using formal ontology for
integrated spatial data mining
Julie Sungsoon Hwang
Department of Geography
State University of New York at Buffalo
ICCSA04
Perugia, Italy
May 14, 2004
Research purposes

Enlighten the role of formal ontology in
KDD

Propose the conceptual framework for
ontology-based spatial data mining

Case study: ontology-based spatial
clustering algorithms
Problems in focus (cont.)

No single algorithm is best suited to all
research purposes and application
domains.

The same algorithm can yield results
inconsistent with fact without considering
domain knowledge

The same data may have to be analyzed in
different ways depending on users’ goal
Problems in focus

Developing new
algorithms
Algorithm
D
Algorithm
C
Algorithm
A

Re-using existing
algorithms
Algorithm
D’
Domain
Algorithm
B
Task
Suited to domain and task
How can algorithms be customized to varying domain and task?
Relation between data mining and
ontology construction
Ontology
(Knowledge acquisition)
Knowledge
Information
Data
Data Mining
(Knowledge discovery)
Level of abstraction
Ontology Construction
Role of formal ontology in KDD

Guide algorithms such that they can be suitable
for domain-specific and task-oriented concepts
KDD Process Diagram

Provide the context in which the knowledge
extracted from data is interpreted and evaluated
Using ontology for spatial data mining
Ontology
Spatial Data Mining
High-level knowledge
Domain
Model


Task
Model
Low-level data
Ontology formalizes how the knowledge is conceptualized,
thereby making implicit meaning explicit
Data mining extracts a high-level knowledge from a low-level
data, thereby enhancing the level of understanding
Domain-specific spatial data mining

Let’s compare two different domains: traffic accident
versus retailers
Domain of traffic
accident
Is-a
Event
Spatial
In road network
constraints
Domain of retailers
Physical object
Outside of
road network
Spatial data mining algorithms should take into account
different conceptualization (domain-specific properties)
Task-oriented spatial data mining

Let’s compare two different tasks: detecting hotspots of
traffic accident versus partitioning market areas based on
the location of retail
Detect hotspots of Partition market
traffic accident
areas to a retailer
# of
clusters k
Depend on
spatial distributn.
Given (resource
constraint)
Level of
details
Varies with scale
(depends on area
of users’ interest)
Doesn’t vary with
scale
Spatial data mining algorithms should take into account
different tasks and users’ need
Ontology as an active component
of information system
Top-level Ontology
Domain Ontology
e.g. space, time,
matter, object, event
Task Ontology
e.g. diagnosing
e.g. medicine
dependence
Application Ontology
subject
From Guarino, 1998
Conceptual framework for ontologybased spatial data mining (OBSDM)
Component of OBSDM
OBSDM:: Input:: Metadata

Tag structure of XML can be utilized to inform
domain ontology of the semantics of data
Component of OBSDM
OBSDM:: OBSDMM:: Domain Ont.

Terms within the “theme” tag in the metadata
are used as a token to locate the appropriate
domain ontology

Domain ontology specifies the definition, class,
and properties



Class example: Accident is a Subclass-Of TemporalThing
Properties example: Road has a Geographic-Region
as a Value-Type
Properties of class inherit from top-level
ontology
Domain ontology := Traffic accident


Theory TRAFFIC-ACCIDENT-DOMAIN
As a spatial thing,



As a temporal thing,




Point(x)  On(x, y)  Roadway(y)
Line(y)  In(y, z)  Geographic-Region(z)
Point(x)  At(x, y)  Time(y)
Event(x) <=> Occurrence(x)  Notification(x)  Response(x)
 Arrival(x)
Before(Occurrence(x), Notification(x))
As an intangible thing,

Accident (x)  RelatedTo(x, y)  Vehicle(y)
Component of OBSDM
OBSDM:: Input:: User Interface

Users can specify a goal, level of detail, and
geographic area of interest through UI
Component of OBSDM
OBSDM:: OBSDMM:: Task Ont.

The inputs specified by users in the user
interface are translated into task ontology

Task ontology explicitly specify goal,
methods, requirements, and constraint
Task ontology := Spatial clustering


Theory SPATIAL-CLUSTERING-TASK
Documentation:



This theory defines a task ontology for the spatial clustering task. The spatial
clustering task, which is a class of clustering task, is a problem of grouping
similar spatial objects into classes.
Super classes: Clustering
Subclasses:


Sub goal:

“Find hot spots”

“Group similar patterns”
 “Partition into k-clusters”
Requirement:
 Assignment-Object


Geographic-Scale
 Detail-Level
Constraint:
 Spatial Objects
 Operational Constraints


Source: Spatial Objects
Target: Clusters
Component of OBSDM
OBSDM:: OBSDMM:: Alg. Builder
OBSDM:: Output:: GVis tool

Algorithm builder puts together requirements for
building the best algorithm suited to domain of
data and users’ input (task).

Data content is filtered through domain ontology,
and the users’ requirement is filtered through
task ontology.

The geographic visualization tool displays
results (pattern discovered)
Case study:
ontology-based spatial clustering of traffic accidents
Setting
Metadata
Theme := Traffic Accident
User interface
OBS
C
Goal := “identify hot spots”
LevelOfDetail := State
PlaceName := New York
Method
Algorithm := SMTIN
Input: 353 features in Erie
Constraint := Named-Roadway
Output: 18 clusters in Erie County
Case study:
Effect of scale (Task ontology)
Control Algorithm
TASK
OBSC Algorithm
TASK
LevelOfDetail := Null
LevelOfDetail := County
PlaceName := Null
PlaceName := New York
DOMAIN
Constraint := Roadway
DOMAIN
Constraint := Roadway
Specifying area of
interest doesn’t
mask details

OBSC clusters reflect spatial distribution specific to
the scale of users’ interest
Case study:
Effect of constraint (Domain ontology)
Control Algorithm
TASK
TASK
LevelOfDetail := State
LevelOfDetail := State
PlaceName := New York
PlaceName := New York
DOMAIN
Constraint := Null

OBSC Algorithm
DOMAIN
Constraint := Roadway
Separated
by body of
water
OBSC clusters identify the physical barrier due to
concept implicit in domain
Case study:
Benefit of using ontology in spatial clustering

Incorporating ontology in spatial clustering
algorithms enhances the quality of spatial
clustering results

Task ontology makes clusters usable


Responsive to users’ view
Domain ontology makes clusters natural

Dictated by concept implicit in domain
Conclusion (cont.)

Presents how ontology are incorporated in
spatial data mining algorithms

Semantic linkage between ontologies and
algorithms through parameterization
Scale as a task-oriented property
 Constraint as a domain-specific property

Conclusion

Ontology is examined as a means to customize
algorithms to varying domain and task



Ontology enables algorithms to reflect concepts implicit
in domain, and adapt to users’ view
Ontology provides the semantically plausible way to reuse existing algorithms
Ontology provides the systematic way of
organizing various factors that dictate
mechanisms underlying data mining process