Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Bela Stantic
School of Information and Communication Technology – Griffith University, Australia
Institute for Integrated and Intelligent Systems, Griffith University, Australia
Visiting Professor - Department of Mathematic and Informatics, Faculty of Science,
University of Novi Sad
Email: [email protected]
Institute for Integrated and Intelligent Systems - IIIS
An implicit approach to deal with
periodically repeated data
Dr Bela Stantic
School of Information and Communication Technology – Griffith University, Australia
Institute for Integrated and Intelligent Systems, Griffith University, Australia
Acknowledgement:
Universita’ del Piemonte Orientale,
Alessandria, Italy
Abdul Sattar - National Information and Communications Technology,
Brisbane , Australia
Paolo Terenziani -
Institute for Integrated and Intelligent Systems - IIIS
Publications

Bela Stantic, Paolo Terenziani, Guido Governatori, Alessio Bottrighi,
and Abdul Sattar, (2012), ‘ An implicit approach to deal with periodicallyrepeated medical data’, Journal in Artificial Intelligence in Medicine
Volume: 55, doi:10.1016/j.artmed.2012.03.002, Pages: 149 – 162.

Bela Stantic, Paolo Terenziani, Abdul Sattar, Alessio Bottrighi and
Guido Governatori, 2010, ‘Towards an implicit treatment of periodicallyrepeated medical data’, in 13th World Congress on Medical and Health
Informatics MedInfo, Pages: 1131-1135.

Paolo Terenziani, Bela Stantic, Guido Governatori, Alessio Bottrighi and
Abdul Sattar, (2012), ‘An implicit approach for periodic data in relational
databases’, in Journal of Intelligent Information Systems: integrating
artificial intelligence and database technologies, (under review)
Institute for Integrated and Intelligent Systems - IIIS
Overview




Introduction - Periodic data
Related work
Motivation
Implicit vs Explicit approach





TSQL2 data model
Periodic and quasi-periodic granularities
A relational representation of quasi-periodic data
Temporal algebra
Temporal extensions:
 Temporal union
 Temporal difference
 Temporal projection
 Temporal Cartesian Project
 Algorithm for range queries
Experiment
Results
Conclusion



Institute for Integrated and Intelligent Systems - IIIS
Introduction
 Periodic events seem to be an intrinsic part of our life, and easier
way of perceive reality.
 Days repeat at regular periodic patterns, as well as seasons.
 Many human activities are scheduled at periodic time (office
activities, scheduling airplanes, lessons, medical activities, etc )
 It is widely agreed that adopting a “standard” and fixed menu of
granularities (e.g., minutes, hours, days, weeks, years in the
Gregorian calendar system) is not enough in order to provide the
required expressiveness and flexibility.
 Additionally, there are different calendars which depend on the
cultural, legal and even purpose of the system.
 These calendar systems use user-defined periodic granularities
(e.g., the academic, legal, financial year, etc)
Institute for Integrated and Intelligent Systems - IIIS
Introduction
 The number of repetitions of periodic data may be very large, like in
production and in some cases repetitions may also be “openended”(e.g., therapies for chronical patients may be repeated all the
life long).
 In Computer Science literature, in particular, in the areas of
Databases, Logics, and Artificial Intelligence, there is a common
agreement that formalisms are needed in order to cope with userdefined periodic data in an implicit (also termed intensional) way
(i.e., without an explicit storing of all the repetitions).
Institute for Integrated and Intelligent Systems - IIIS
Related work



Periodic data play an important role in Databases, so a specific entry has been
devoted to such a topic in the new Encyclopedia of Database Systems
[Terenziani, 2009].
In the Encyclopedia three main classes of Database (implicit) approaches to
user-defined periodicities have been identified:
» Deductive rule-based approaches, using deductive rules (e.g., approaches
in classical temporal logics), [Chomicki et al, 1993]
» Constraint-based approaches, using mathematical formulae [Kabanza ,
1995],
» Algebraic (also termed symbolic) approaches, which provide a set of “highlevel” and “user-friendly” operators [Bettini et al, 2000, Ning et al, 2002,
Terenziani et al, 2003].
In most approaches in the literature (and, in particular, in symbolic approaches),
the focus is on the design of high-level formalisms to model user-defined
periodicities in a “commonsense” or at least “user-friendly” way.
Institute for Integrated and Intelligent Systems - IIIS
Related work




Despite there seems to be a general agreement within the Database literature
that general-purpose implicit approach is needed in order to cope with userdefined periodic data, and despite the fact that a lot of such approaches have
been devised in the last two decades (the survey by Tuzhlin and Clifford, focused
only on Database area, identified 34 different approaches).
However, none of them focus specifically on the definition of a comprehensive
relational approach coping with user-defined periodic data efficiently, considering:
» a relational implicit representation,
» additional temporal operators, e.g., range queries,
» indexing and efficient access.
Also they do not take into account issues such as the definition of relational
temporal algebraic operators, extending Codd’s operators to query periodic data,
All such issues are fundamental for the practical applicability of any Database
approach considering periodic data.
Institute for Integrated and Intelligent Systems - IIIS
Goals
 The new comprehensive approach has to be designed in such a
way that:
» data model has the expressiveness to capture all ”periodic
granularities”, as defined in the Database literature,
» data model has to be a consistent extension of the TSQL2,
» algebraic and temporal query operators has to be correct and
complete with respect to conventional explicit approaches
» Additionally, extended algebraic operators has to operate in
polynomial time, and are a consistent extension of standard
non-temporal relational algebraic ones.
Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. Explicit approaches
 There is an obvious and trivial way to cope with periodic data, by
explicitly storing all of them.
 Usually called “explicit” (or “extensional”) approach, basically
reduces periodic data to standard temporal data.
 For instance, to store activity A scheduled each day from 8 to 12
and 14 to 18 in a whole year, an explicit approach simply stores
365 time periods in which the activity takes place.
Institute for Integrated and Intelligent Systems - IIIS
Explicit Representation of Periodic data
Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. Explicit approaches
 There is an obvious and trivial way to cope with periodic data, by
explicitly storing all of them.
 Usually called “explicit” (or “extensional”) approach, basically
reduces periodic data to standard temporal data.
 For instance, to store activity A scheduled each day from 8 to 12
and 14 to 18 in a whole year, an explicit approach simply stores
365 time periods in which the activity takes place.
 The obvious advantage of such an approach is its simplicity as
periodic temporal data are simply coped with as standard temporal
data, so that any temporal Database approach in the literature can
be used.
 It also makes all of the database implementation simpler, from
indexing to query processing.
Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. explicit approaches
 However, there are at least three main disadvantages to the
“explicit” approach:
» It is not “commonsense” and “human-oriented”, humans
usually tend to abstract, so that they usually prefer to manage
periodic data in an implicit way, for example every day this year
from 8 to 12 and 14 to 18, or each Monday from 15:00 to 17:00,
etc.
» It is very expensive in term of space complexity, due to the
high storage requirements. In many practical application areas,
the number of repetitions (at periodic time) of activities is very
high, (Recording the activities on an schedule in a production
chain).
» It is not feasible in the case of “open ended” data (i.e., of
data whose valid time end is open and for which there is no
known future end).
Institute for Integrated and Intelligent Systems - IIIS
TSQL2 data model
 Let us restrict our discussion to valid time only and do not consider
transaction time.
 In many approaches (and, in particular, in TSQL2), valid time is
associated to relational tuples, in the form of a pair of timestamps (the
starting point of the valid time, and its ending point).
 Given any schema R=(A1, …, An) (where A1 , …, An are standard nontemporal attributes), a valid time relation r in TSQL2 is a relation defined
over the schema:
 where VTS and VTE are timestamps representing the starting and the
ending time of the valid time period respectively.
Institute for Integrated and Intelligent Systems - IIIS
Representing periodic granularities
 We define quasi-periodic granularity G by a quadruple:
 where :
» P is an integer representing the duration of the periodic pattern;
» IP is the set of the convex periods in the first “periodic pattern”
(after the granule “0” of the bottom granularity);
» IE is the set of the convex periods constituting the aperiodic part;
» FT is the “frame time” i.e., the period containing all repetitions
of the periodic pattern.
Institute for Integrated and Intelligent Systems - IIIS
Representing periodic granularities
 Let us exemplify our representation through an example:
 As a working example, let us suppose that, in the year 2012,
starting from Monday January 9th and ending on Sunday
December 23rd, the “working shift” for a company is from 08:00 to
12:00, and from 14:00 to 18:00 each day from Monday to Friday
and from 8 to 12 on Saturday (let us call such a granularity “working
shift - WS”).
 In addition, let us suppose that the “WS” also includes Saturday
evening (from 14 to 18) in two specific days (say on January 13 and
20 (let us call “WS+” the granularity WS with such an addition).
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168,{[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567]⟩
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168,{[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567]⟩
WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]},
{[460, 464], [620, 624]}; [0, 8567]⟩
 168 hours (one week) is the duration of the periodic pattern - Per
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567]⟩
WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]},
{[460, 464], [620, 624]}; [0, 8567]⟩
 168 hours (one week) is the duration of the periodic pattern - Per
 {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the
periodic pattern- IP
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567]⟩
WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]},
{[460, 464], [620, 624]}, [0, 8567]⟩
 168 hours (one week) is the duration of the periodic pattern - Per,
 {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the
periodic pattern - IP
 {[460, 464], [620, 624]} are the two aperiodic part repetitions - IE
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567] ⟩
WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]},
{[460, 464], [620, 624]}, [0, 8567]⟩
 168 hours (one week) is the duration of the periodic pattern - Per
 {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the
periodic pattern - IP,
 {[460, 464], [620, 624]} are the two additional repetitions for WS+ - IE
 [0, 8567] is the frame of time containing all the periodical repetitions on
the periodic pattern - FT.
Institute for Integrated and Intelligent Systems - IIIS
A relational representation of quasi-periodic data
 We define the Periodic relation as:
» Given any schema R=(A1, …, An) where A1, …, An are standard
non-temporal attributes, a periodic relation r is a relation defined
over the schema:
RP = (A1, …, An | VTS , VTE , Per, IP)
VTS
VTE
Per
IP
- timestamp representing the starting point of the “frame time”
- timestamp representing the ending point of the “frame time”
- interval, representing the duration of the repetition pattern
- denoting a set of the periodic patterns (PerID)
Institute for Integrated and Intelligent Systems - IIIS
Periodicity Relation
 The periodic_pattern relation is defined over the schema:
(PerID, Start, End )
in which:
» PerID - attribute containing identifiers denoting periodic patterns
» Start and End are temporal attributes (timestamps) denoting the
starting and the ending points of the periods in the periodic pattern.
Institute for Integrated and Intelligent Systems - IIIS
Relations

Periodic relations - Activity(Implicit model):

Periodic_Pattern relation (Implicit model):

Periodic (Explicit model):
Institute for Integrated and Intelligent Systems - IIIS
Representations
Institute for Integrated and Intelligent Systems - IIIS
Temporal algebra
 Codd designated as complete any query language that was as
expressive as his set of five relational algebraic operators, relational
union (∪), relational difference (−), selection (σ), projection (π), and
Cartesian product (×).
 We proposed an extension of Codd’s algebraic operators in order to
query the new data model which manages implicitly periodic data.
 Such a temporal algebra can provide the ground for a proper
extension of SQL, or of the temporal language TSQL2 to cope with
implicit periodic data.
Institute for Integrated and Intelligent Systems - IIIS
Temporal union
 As it can be seen temporal relational projection simply operates on the
non-temporal part of the input tuples, retaining only the values of the
input attributes. The temporal component of the tuples is left
unchanged.
Institute for Integrated and Intelligent Systems - IIIS
Temporal projection
 For example, the query asks for patients (and the time of their
treatments):
 And it should result in:
Institute for Integrated and Intelligent Systems - IIIS
Non-temporal selection
 For instance, the query below asks for patients (and the time of their
treatments) and requires that the value of Act Type attribute is A:
 Result is:
Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project
Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project
 lcm, min, and max denote the least common multiple, minimum and
maximum functions;
 generate id() is a function that generated a new unique identifier;
 pattern(id, Periodicity) denotes the set of periods corresponding to the
identifier id in the table Periodicity.
 The function generate_inters_pattern takes in input two identifiers id1
and id2 of periodic patterns in the Periodicity table, the identifier idnew
of a new pattern (the intersection pattern) to be introduced in the table,
the time tstart at which the new pattern starts, its duration Per, and the
Periodicity table.
Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project
 It operates in three steps:
» (1) It retrieves from the Periodicity table the patterns corresponding to id1
and id2 and makes such patterns explicit on the whole period [tstart, tstart
+Per], let < p11,…, p1k > and < p21,…, p2h > be the lists of temporally
ordered periods obtained by making explicit the periodic pattern
corresponding to id1 and id2 respectively (the function make explicit
accomplishes such tasks).
» (2) It evaluates the intersection between < p11,…, p1k > and < p21,…,
p2h>, let <p1,…, pj > be the resulting ordered lists of periods
» (3) If < p1,…, pj > is empty, the empty set is given as result, otherwise the
table Periodicity is updated with the insertion of the new pattern < p1,…, pj >
for the new identifier idnew.
 The definition of temporal Cartesian product can be extended to
temporal definitions of theta join, natural join, outer joins, and outer
Cartesian products, in a way similar as presented in Gao et al (2005).
Institute for Integrated and Intelligent Systems - IIIS
Temporal difference
Institute for Integrated and Intelligent Systems - IIIS
Temporal difference
 The function generate_difference_pattern (and generate_union_pattern) is analogous
to generate_intersection_pattern, and generates in a frame time whose duration is the
duration of the (new) repetition pattern the new pattern obtained by making the difference of
the two input patterns (each ones being a set of periods).
 Generate_union_pattern is a slight generalization of such a process, in which union must
be performed on an arbitrary number of such patterns.
Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries
Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries
Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities
 WS and WS+ are represented in our formalism as follows, considering
hours as the bottom granularity (notice that our approach is independent
of the choice of the bottom granularity).
WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]},
{ }, [0, 8567]⟩
WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]},
{[460, 464], [620, 624]}, [0, 8567]⟩
 168 hours (one week) is the duration of the periodic pattern - Per,
 {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the
periodic pattern - IP
 {[460, 464], [620, 624]} are the two aperiodic part repetitions - IE
Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries
Institute for Integrated and Intelligent Systems - IIIS
Check Periodic Intersection
Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries
 The “circular module” operation is
basically a standard module
operator which operates on the
endpoints of all the input periods,
with the caution that, since the
pattern is periodic, it must be
interpreted in a “circular” fashion.
 G_Intersects generate the
Intersects predicate, in case of set
of periods.
Institute for Integrated and Intelligent Systems - IIIS
Experiment – Environment
 Our implementation has been carried out on a 8 x UltraSparc III @
900Hz with 8GB of RAM memory, running Oracle 11 RDBMS, with a
database block size of 8K and size of SGA of 1000MB
 We used Oracle built-in methods for statistics collection, analytic SQL
functions and the PL/SQL procedural runtime environment. For range
queries we utilised the RI-tree indexing method.
 Data set is generated for the medical informatics domain because many
activities are routinely executed at periodic time by nurses on
hospitalized patients.
 There are also cases of open-ended repeated activities (e.g., dialysis on
diabetic patients must usually be performed twice or three times each
week, for the life of the patient).
Institute for Integrated and Intelligent Systems - IIIS
Sample therapy for multiple mieloma





The therapy for multiple mieloma is made by six cycles of 5-day
treatment, each one followed by a delay of 23 days (for a total time of 24
weeks).
Within each cycle of 5 days, 2 inner cycles can be distinguished:
» the melphalan treatment, to be provided twice a day, for each of the
5 days and
» the prednisone treatment, to be provided once a day for each of the
5 days.
These two treatments must be performed in parallel.
It is important to record such data in the hospital database, in order to
schedule hospital personnel activities, and to manage resource
allocation.
It is obvious that managing this activities in explicit way is complex and
also ity can cause errors
Institute for Integrated and Intelligent Systems - IIIS
Experiment – Data set






Most clinical data are naturally
temporal.
Number of Patients 16,824
Average number of periodic
activities per patient 8.30
Average number of periods in a
periodic pattern = 4.86
Average duration of period of
periodic patterns = 87.56
Average duration of the frame
time 1169
 As Explicit approach cannot
cope with open-ended
intervals we could not
include them in experiment
Institute for Integrated and Intelligent Systems - IIIS
Space Requirements
Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Product - Results
Institute for Integrated and Intelligent Systems - IIIS
Temporal Join - Results
 For Temporal Join experiment we ran a query against temporal tables
Activity and Working shifts to check which employees on duty can
perform specific actions (ActID), (Working shifts JOIN Emp Capabilities)
JOIN Activity.
Institute for Integrated and Intelligent Systems - IIIS
Range queries - Results
Institute for Integrated and Intelligent Systems - IIIS
Explicit/implicit ratio
Institute for Integrated and Intelligent Systems - IIIS
Conclusion
 We proposed a new approach to cope with periodic data in relational
Databases, specifically:
» we have proposed an “implicit” relational data model for user-defined
periodic data, which is based on the “consensus” definition of
granularity and is a “consistent extension” of TSQL2’s,
» we have extended Codd’s algebraic operators of Cartesian product,
Union, Projection, non-temporal selection, and Difference,
» we have shown that such operators are complete and correct,
» finally, in an extensive experimentation of our model we showed that
our “implicit” approach outperforms the performance of traditional
“explicit” approach.
Institute for Integrated and Intelligent Systems - IIIS
Thank you for your attention!
Institute for Integrated and Intelligent Systems - IIIS