Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Universidade Federal de Santa Catarina, Florianopolis, Brazil Informatics and Statistics Department A conceptual Data Model for Trajectory Data Mining * Prof. Vania Bogorny (INE/UFSC - Brazil) [email protected] Prof. Carlos Alberto Heuser (II/UFRGS - Brazil) Prof. Luis Otavio Alvares (II/UFRGS-Brazil) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 1 Outline • • • • • • 5/22/2017 Motivation Objective Basic concepts Proposed Model Evaluation Conclusion GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 2 Introduction and Motivation • On the one side (database technology.......) – Since its origin, database design has the purpose of modeling data for operational purposes only – Database designers don't think about data mining during the conceptual database design 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 3 Introduction and Motivation • On the other side (artificial intelligence.......) – Data mining (DM) or knowledge discovery (KDD) from databases has become very popular in the last years in many fields and several application domains – Dozens of new data mining algorithms have been proposed in the last decade, • but very little has been done for the automatic data preprocessing, which is the most time consuming step 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 4 Introduction and Motivation DATABASE Modelling (Normalization) DATA MINING (Disnormalization) One single file 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 5 Introduction and Motivation • Another problem for data mining: – data have to be preprocessed and transformed into different granularities – Examples: • Louvre Museum Museum TuristicPlace Instance + type 5/22/2017 type GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 6 Introduction and Motivation • These problems increase when dealing with trajectories of moving objects, which is the focus of this paper 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 7 Objective We propose a conceptual framework for trajectory database modeling that supports data mining 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 8 Basic Concepts 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 9 Trajectory Data • Trajectories are new kind of spatiotemporal data • Trajectories have attracted intensive research in both databases and data mining communities 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 10 Trajectory Raw Data • Trajectory Data are: – Spatio-temporal data – Represented by a set of points located in space and time – Form: (tid, x,y,t), where tid is the trajectory identifier, (x,y) represent the spatial location at time t Tid 1 1 ... 1 1 1 ... 1 1 ... 2 5/22/2017 position (x,y) 48.890018 2.246100 48.890018 2.246100 ... 48.890020 2.246102 48.888880 2.248208 48.885732 2.255031 ... 48.858434 2.336105 48.853611 2.349190 ... ... time (t) 08:25 08:26 ... 08:40 08:41 08:42 ... 09:04 09:05 ... ... GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 11 The Model of Stops and Moves (Spaccapietra 2008) STOPS – Important parts of trajectories – Where the moving object has stayed for a minimal amount of time – Stops are application dependent • Tourism application – Hotels, touristic places, airport, … • Traffic Management Application – Traffic lights, roundabouts, big events… MOVES – Are the parts that are not stops 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 12 Semantic Trajectories • A semantic trajectory is a set of stops and moves – Stops have by a place, a start time and an end time – Moves are characterized by two consecutive stops 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 13 STOPS at Multiple-Granularities Stop at Ibis Hotel from 6:04PM to 7:42PM, september 16, 2010 space time IbisHotel or Hotel or Accommodation Afternoon or Thursday or 6:00PM – 8:00PM or RUSH-HOUR 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 14 ITEMS - the building blocks for semantic pattern discovery • An item is generated either from a stop or a move • An item is a set of complex information (space + time), that can be defined in many formats/types and at different granularities 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 15 Building an ITEM for Data Mining • Formats/types for an item: • NameOnly: is the name of the stop/move – STOPS: name of the spatial feature instance • IbisHotel – MOVES: name of the two stops which define the move • ZurichAirport – IbisHotel • NameStart: is the name of the stop/move + start time – IbisHotel [morning] --stop – LouvreMuseum [weekend] --stop – IbisHotel-ZurichAirport [10:00AM-11:00AM] --move 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 16 Building an ITEM for Data Mining • NameEnd: name of a stop/move + end time – IbisHotel[morning] stop – IbisHotel-ZurichAirport[10:00AM-11:00AM] move • NameStartEnd: name of a stop/move + start time + end time – IbisHotel[08:00AM-11:00AM][1:00pm-6:00pm] stop – LouvreMuseum[morning][afternoon] stop – ZurichAirport– IbisHotel [10:00AM-11:00PM] [10:00AM-6:00PM] 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 17 Semantic Trajectory Patterns Frequent Patterns Sequential Patterns and Association Rules 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 18 Trajectory Frequent Pattern • Is a set of items that occur a minimal number of times (support s) • Examples: {LouvreMuseum [08:00-10:00]} (s=0.1) {Airport [morning], hotel [morning]} (s=0.2) {Airport-Hotel, Hotel-Museum} (s=0.15) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 19 Trajectory Sequential Pattern • Is an ordered list of items that occur a minimal number of times (support s) • Examples: <Airport[morning], Hotel[morning], Museum[afternoon] > <Airport-Hotel, Hotel-Museum> (s= 0.1) 5/22/2017 (s=0.15) GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 20 Trajectory Association Rule • Is a rule where the items occur a minimal number of times (support s) and with a minimal confidence (c) • Example – Airport[morning], Hotel[morning] Museum[afternoon] (s=0.1) (c=0.5) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 21 The Proposed Model 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 22 The Proposed Model • We extend the model of stops and Moves proposed by Spaccapietra with new attributes and methods • Add new classes and relationships, with attributes and methods to automatic data preprocessing and multiple-level mining 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 23 The Conceptual Data Model of Stops and Moves (Spaccapietra 2008) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 24 Proposed OO Model Compute and Store the patterns Data Pre-processing Spaccapietra´s Model 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 25 Proposed OO Model Stops and Moves are extended with new attributes (specific time, e.g. 07:10 – 08:05 ) and methods to instatiate stops and moves Concept Hierarchy for the spatial feature type (e.g.: AccomodationPlace Hotel) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 26 Proposed Model OO Model Generic class to represent the 3 kinds Attributes: support, listOfItems of patterns Methods: countSupport(), sequentialPattern() Attributes: support, confidence, antecedent (set Attributes : startT, endT (generic time, e.g. Morning) of items) and consequent Methods: Frequent (set of Patterns: items) getGenericSpatialFeature() – retrieves the hierarchy level Attributes: support, setOfItems timeG() – generalizes time Methods: countSupport(), countSupport(), spaceG() – generalizes Methods: space based on the hierarchy associatePattern(), and frequentPattern() buildItem() – creates generalized ITEM computeConfidence() 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 27 Example of an Instantiated Model 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 28 28 Schema of Stops and Moves • STOP (Tid integer, Sid integer, SFTname string, SFid integer, startT timestamp, endT timestamp) Ex.: stop (1,1,Hotel, 3, 10AM, 11AM) • MOVE (Tid integer, Mid integer, SFT1name string, SF1id integer, SFT2name string, SF2id integer, startT timestamp, endT timestamp, the_move geometry) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 29 Schema of the Patterns Nested relation FrequentPattern/ SequentialPattern (Pid integer, pattern itemSetType, support real) itemSetType (SFT1name string, SF1id integer, SFT2name string, SF2id integer, startT string, endT string) AssociatePattern (Pid integer, antecedent itemSetType, consequent itemSetType, support real, confidence real) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 30 Instantiating and Querying Patterns To instantiate the patterns we can use the ST-DMQL proposed in (Bogorny 2009) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 31 Instantiating Stops and Moves SELECT generateS (method, candidateStops, buffer) FROM trajectory IB-SMOT CB-SMOT DB-SMOT ...... SELECT generateM (method, candidateStops, buffer) FROM trajectory 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 32 Instatiating Sequential Patterns Q1 (tourism application): Which are the sequences of moves that occur most frequently in the morning and in the evening? Method in the ST-DMQL SELECT sequentialPattern (itemType = NameEnd, timeG = [8:00-12:00 AS morning, 18:00-23:00 AS evening], spaceG = instance, minsup=0.03) FROM move Ans: {IbisHotel - NotreDame[morning], EiffelTower – IbisHotel [evening]} (s=0.04) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 33 Example of Pattern Queries Q: How many moves of sequential patterns cross Pont Neuf bridge? SELECT count(m.*) FROM sequentialPattern s, bridge b, move m WHERE s.pattern.SFT1name=m.SFT1name AND s.pattern.SF1id=m.SF1id AND s.pattern.SFT2name=m.SFT2name AND s.pattern.SF2id=m.SF2id AND b.name='Pont Neuf' AND intersects (m.the_geom,b.the_geom) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 34 Conclusions • Data pre-processing is the most time consuming step for DM and KDD • To think about data mining during the conceptual design of a database can significantly reduce these steps 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 35 Conclusions • The proposed model: – – – * Reduces the pre-processing tasks Supports mining at multiple granularity levels Automatically prepares the data for data mining Stores the patterns for futures queries Multiple-granularities data patterns Queries 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 36 Geometric Patterns X Semantic Patterns (Bogorny 2009) Geometric Pattern TP R TP R CC T2 CC T3 T2 T1 T4 T3 H H T1 H Hotel T4 R Restaurant TP Touristic Place Semantic trajectory Pattern (a) Hotel to Restaurant, passing by CC (b) go to Cinema, passing by CC 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 37 Thank You! 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 38 More examples for generating stops SELECT generateS (CB-SMOT, [Hotel,60,TouristicPlace,15,ShoppingCenter,30], 5) FROM trajectory t, district d WHERE d.name='Bela Vista' and intersects (t.movingpoint.geometry, d.geometry) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 39 Querying Rules Suppose that the user is interested in association patterns which have weekend as the time dimension in the antecedent of the rule SELECT * FROM associatePattern WHERE antecedent.startT='weekend' or antecedent.endT='weekend' 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania 40 Basic Concepts: Support 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania Basic Concepts: Semantic Trajectory Patterns Example Work [morning], ShoppingCenter [afternoon], Gym [afternoon] (s=0.08%) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania Basic Concepts: Semantic Trajectory Patterns Example Home [night], Work [afternoon] Gym [afternoon] (s=0.10%) (c=0.50) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania Basic Concepts: Semantic Trajectory Patterns Example ReligiousPlace [weekend], Restaurant [weekend] (s=0. 07) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania Related Works Mining Trajectory Samples – Extract Geometric Patterns Mining Semantic Trajectories or Trajectory preprocessing for mining Attempts to reduce the gap between databases and data mining •Laube 2002, 2005 •Giannotti 2007 •Lee 2007 •Cao 2006, 2007 •Li 2010 •Alvares 2007 •Zhou 2007 •Palma 2008 •Bogorny 2009 •Manso 2010 •Data mining query languages, but not for trajectories (Wang 2003, Malerba 2004, Han 1995) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania Example of Frequent Pattern Instantiation Q2: Which are the types of places most frequently visited by tourists on weekdays and weekends? Method in the STDMQL SELECT frequentPattern (itemType =NameStart, timeG = WEEKEND-WEEKDAY, spaceG = [type, GenericHotel = 1], minsup = 0.15) FROM stop Ans: {4StarsHotel[weekend], Museum[weekend], Restaurant[weekend] } (s=0.16) 5/22/2017 GIScience 2010 – A conceptual data model for trajectory data mining Vania Bogorny, Universidade Federal de Santa Catarina, Brazil, www.inf.ufsc.br/~vania