Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Universidade Federal de Santa Catarina, Florianopolis, Brazil
Informatics and Statistics Department
A conceptual Data Model for
Trajectory Data Mining
* Prof. Vania Bogorny (INE/UFSC - Brazil)
[email protected]
Prof. Carlos Alberto Heuser (II/UFRGS - Brazil)
Prof. Luis Otavio Alvares (II/UFRGS-Brazil)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
1
Outline
•
•
•
•
•
•
5/22/2017
Motivation
Objective
Basic concepts
Proposed Model
Evaluation
Conclusion
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
2
Introduction and Motivation
• On the one side (database technology.......)
– Since its origin, database design has the purpose of
modeling data for operational purposes only
– Database designers don't think about data mining
during the conceptual database design
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
3
Introduction and Motivation
• On the other side (artificial intelligence.......)
– Data mining (DM) or knowledge discovery (KDD)
from databases has become very popular in the last
years in many fields and several application domains
– Dozens of new data mining algorithms have been
proposed in the last decade,
• but very little has been done for the automatic data
preprocessing, which is the most time consuming step
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
4
Introduction and Motivation
DATABASE Modelling
(Normalization)
DATA MINING
(Disnormalization)
One single file
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
5
Introduction and Motivation
• Another problem for data mining:
– data have to be preprocessed and transformed
into different granularities
– Examples:
• Louvre Museum  Museum  TuristicPlace
Instance + type
5/22/2017
type
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
6
Introduction and Motivation
• These problems increase when dealing
with trajectories of moving objects,
which is the focus of this paper
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
7
Objective
We propose a conceptual framework for
trajectory database modeling that supports
data mining
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
8
Basic Concepts
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
9
Trajectory Data
• Trajectories are new kind of spatiotemporal data
• Trajectories have attracted intensive
research in both databases and data
mining communities
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
10
Trajectory Raw Data
• Trajectory Data are:
– Spatio-temporal data
– Represented by a set of points located in space and time
– Form: (tid, x,y,t), where tid is the trajectory identifier,
(x,y) represent the spatial location at time t
Tid
1
1
...
1
1
1
...
1
1
...
2
5/22/2017
position (x,y)
48.890018 2.246100
48.890018 2.246100
...
48.890020 2.246102
48.888880 2.248208
48.885732 2.255031
...
48.858434 2.336105
48.853611 2.349190
...
...
time (t)
08:25
08:26
...
08:40
08:41
08:42
...
09:04
09:05
...
...
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
11
The Model of Stops and Moves (Spaccapietra 2008)
STOPS
– Important parts of trajectories
– Where the moving object has stayed for a minimal
amount of time
– Stops are application dependent
• Tourism application
– Hotels, touristic places, airport, …
• Traffic Management Application
– Traffic lights, roundabouts, big events…
MOVES
– Are the parts that are not stops
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
12
Semantic Trajectories
• A semantic trajectory is a set of stops and
moves
– Stops have by a place, a start time and an end time
– Moves are characterized by two consecutive stops
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
13
STOPS at Multiple-Granularities
Stop at Ibis Hotel from 6:04PM to 7:42PM, september 16, 2010
space
time
IbisHotel or Hotel or Accommodation
Afternoon or Thursday or 6:00PM – 8:00PM or RUSH-HOUR
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
14
ITEMS - the building blocks for semantic pattern
discovery
• An item is generated either from a stop or a move
• An item is a set of complex information (space +
time), that can be defined in many formats/types
and at different granularities
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
15
Building an ITEM for Data Mining
• Formats/types for an item:
• NameOnly: is the name of the stop/move
– STOPS: name of the spatial feature instance
• IbisHotel
– MOVES: name of the two stops which define the move
• ZurichAirport – IbisHotel
• NameStart: is the name of the stop/move + start time
– IbisHotel [morning]
--stop
– LouvreMuseum [weekend]
--stop
– IbisHotel-ZurichAirport [10:00AM-11:00AM] --move
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
16
Building an ITEM for Data Mining
• NameEnd: name of a stop/move + end time
– IbisHotel[morning]
stop
– IbisHotel-ZurichAirport[10:00AM-11:00AM]
 move
• NameStartEnd: name of a stop/move + start time + end time
– IbisHotel[08:00AM-11:00AM][1:00pm-6:00pm]  stop
– LouvreMuseum[morning][afternoon]
 stop
– ZurichAirport– IbisHotel [10:00AM-11:00PM] [10:00AM-6:00PM]
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
17
Semantic Trajectory Patterns
Frequent Patterns
Sequential Patterns and
Association Rules
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
18
Trajectory Frequent Pattern
• Is a set of items that occur a minimal number of
times (support s)
• Examples:
{LouvreMuseum [08:00-10:00]} (s=0.1)
{Airport [morning], hotel [morning]} (s=0.2)
{Airport-Hotel, Hotel-Museum} (s=0.15)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
19
Trajectory Sequential Pattern
• Is an ordered list of items that occur a minimal
number of times (support s)
• Examples:
<Airport[morning], Hotel[morning], Museum[afternoon] >
<Airport-Hotel, Hotel-Museum> (s= 0.1)
5/22/2017
(s=0.15)
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
20
Trajectory Association Rule
• Is a rule where the items occur a minimal number
of times (support s) and with a minimal confidence
(c)
• Example
– Airport[morning], Hotel[morning]  Museum[afternoon] (s=0.1) (c=0.5)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
21
The Proposed Model
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
22
The Proposed Model
• We extend the model of stops and Moves
proposed by Spaccapietra with new
attributes and methods
• Add new classes and relationships, with
attributes and methods to automatic data
preprocessing and multiple-level mining
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
23
The Conceptual Data Model of Stops and Moves
(Spaccapietra 2008)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
24
Proposed OO
Model
Compute and
Store the patterns
Data
Pre-processing
Spaccapietra´s
Model
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
25
Proposed OO
Model
Stops and Moves are extended with new
attributes (specific time, e.g. 07:10 –
08:05 ) and methods to instatiate stops
and moves
Concept Hierarchy for the spatial feature type
(e.g.: AccomodationPlace  Hotel)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
26
Proposed Model
OO Model
Generic class to
represent the 3 kinds
Attributes: support, listOfItems
of patterns
Methods: countSupport(),
sequentialPattern()
Attributes: support,
confidence,
antecedent (set
Attributes : startT, endT (generic time,
e.g. Morning)
of items) and consequent
Methods:
Frequent
(set
of Patterns:
items)
getGenericSpatialFeature()
– retrieves
the hierarchy level
Attributes:
support,
setOfItems
timeG() – generalizes time Methods: countSupport(),
countSupport(),
spaceG() – generalizes Methods:
space
based
on the hierarchy
associatePattern(),
and
frequentPattern()
buildItem() – creates generalized
ITEM
computeConfidence()
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
27
Example of an
Instantiated Model
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
28
28
Schema of Stops and Moves
• STOP (Tid integer, Sid integer, SFTname string,
SFid integer, startT timestamp, endT timestamp)
Ex.: stop (1,1,Hotel, 3, 10AM, 11AM)
• MOVE (Tid integer, Mid integer, SFT1name string,
SF1id integer, SFT2name string, SF2id integer,
startT timestamp, endT timestamp,
the_move geometry)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
29
Schema of the Patterns
Nested relation
FrequentPattern/
SequentialPattern (Pid integer, pattern itemSetType, support real)
itemSetType (SFT1name string, SF1id integer, SFT2name string,
SF2id integer, startT string, endT string)
AssociatePattern (Pid integer, antecedent itemSetType,
consequent itemSetType, support real,
confidence real)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
30
Instantiating and Querying Patterns
To instantiate the patterns we can use the ST-DMQL
proposed in (Bogorny 2009)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
31
Instantiating Stops and Moves
SELECT generateS (method, candidateStops, buffer)
FROM trajectory
IB-SMOT
CB-SMOT
DB-SMOT
......
SELECT generateM (method, candidateStops, buffer)
FROM trajectory
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
32
Instatiating Sequential Patterns
Q1 (tourism application): Which are the sequences of moves that occur
most frequently in the morning and in the evening?
Method in the
ST-DMQL
SELECT sequentialPattern (itemType = NameEnd,
timeG = [8:00-12:00 AS morning,
18:00-23:00 AS evening],
spaceG = instance, minsup=0.03)
FROM move
Ans:
{IbisHotel - NotreDame[morning], EiffelTower – IbisHotel [evening]} (s=0.04)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
33
Example of Pattern Queries
Q: How many moves of sequential patterns cross Pont Neuf
bridge?
SELECT count(m.*)
FROM sequentialPattern s, bridge b, move m
WHERE s.pattern.SFT1name=m.SFT1name AND
s.pattern.SF1id=m.SF1id AND
s.pattern.SFT2name=m.SFT2name AND
s.pattern.SF2id=m.SF2id AND
b.name='Pont Neuf' AND
intersects (m.the_geom,b.the_geom)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
34
Conclusions
• Data pre-processing is the most time consuming
step for DM and KDD
• To think about data mining during the
conceptual design of a database can
significantly reduce these steps
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
35
Conclusions
• The proposed model:
–
–
–
*
Reduces the pre-processing tasks
Supports mining at multiple granularity levels
Automatically prepares the data for data mining
Stores the patterns for futures queries
Multiple-granularities
data
patterns
Queries
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
36
Geometric Patterns X Semantic Patterns (Bogorny 2009)
Geometric Pattern
TP
R
TP
R
CC
T2
CC
T3
T2
T1
T4
T3
H
H
T1
H Hotel
T4
R
Restaurant TP Touristic
Place
Semantic trajectory Pattern
(a) Hotel to Restaurant, passing by CC
(b) go to Cinema, passing by CC
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
37
Thank You!
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
38
More examples for generating stops
SELECT generateS (CB-SMOT,
[Hotel,60,TouristicPlace,15,ShoppingCenter,30], 5)
FROM trajectory t, district d
WHERE d.name='Bela Vista' and
intersects (t.movingpoint.geometry, d.geometry)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
39
Querying Rules
Suppose that the user is interested in association
patterns which have weekend as the time dimension in
the antecedent of the rule
SELECT *
FROM
associatePattern
WHERE antecedent.startT='weekend' or
antecedent.endT='weekend'
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
40
Basic Concepts: Support
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Basic Concepts: Semantic Trajectory Patterns
Example
Work [morning], ShoppingCenter [afternoon], Gym [afternoon] (s=0.08%)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Basic Concepts: Semantic Trajectory Patterns
Example
Home [night], Work [afternoon]  Gym [afternoon] (s=0.10%) (c=0.50)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Basic Concepts: Semantic Trajectory Patterns
Example
ReligiousPlace [weekend], Restaurant [weekend] (s=0. 07)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Related Works
Mining Trajectory
Samples – Extract
Geometric Patterns
Mining Semantic
Trajectories or
Trajectory preprocessing for mining
Attempts to reduce the
gap between
databases and data
mining
•Laube 2002, 2005
•Giannotti 2007
•Lee 2007
•Cao 2006, 2007
•Li 2010
•Alvares 2007
•Zhou 2007
•Palma 2008
•Bogorny 2009
•Manso 2010
•Data mining query
languages, but not for
trajectories
(Wang 2003,
Malerba 2004,
Han 1995)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Example of Frequent Pattern Instantiation
Q2: Which are the types of places most frequently visited by
tourists on weekdays and weekends?
Method in the STDMQL
SELECT frequentPattern (itemType =NameStart,
timeG = WEEKEND-WEEKDAY,
spaceG = [type, GenericHotel = 1],
minsup = 0.15)
FROM stop
Ans:
{4StarsHotel[weekend], Museum[weekend], Restaurant[weekend] } (s=0.16)
5/22/2017
GIScience 2010 – A conceptual data model for trajectory data mining
Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania
Related documents