Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ho Chi Minh City University of Technology Faculty of Computer Science and Engineering Temporal Databases: Principles and Practices Suphamit Chittayasothorn and Vo Thi Ngoc Chau Part 2: Temporal Data-related Issues Dr. Vo Thi Ngoc Chau ([email protected]) February 13, 2014 Content A Quick Review Implementations Support for Temporal Temporal Data Aggregation Temporal Data Mining Conclusion 2 A Quick Review - Temporal Database Concepts Temporal aspect Valid Time Transaction Time User-defined Time Definition Time when the fact is true in reality Time when the fact is current in the database Time uninterpreted and parallel to data domains like NUMBER Provided by User System User Supported by System System User Special language for semantics Yes Yes No Temporal Database - a database that supports some aspect of time, not counting user-defined time (e.g, date_of_birth). Temporal Language – Extensions to QUEL, SQL, OQL 3 A Quick Review - The Handling of Temporal Data Object/tuple timestamping vs. Attribute timestamping r1 1000 800 800 1200 ‘ON1’ ‘ON1’ ‘ON2’ ‘ON1’ [2000, 2003) [2003, 2004) [2004, 2006) [2006, 9999) Valid Time 4 A Quick Review - The Handling of Temporal Data Object/tuple timestamping Area Owner r1 1000 [2000, 2003) r1 800 [2003, 2004) r1 800 [2004, 2006) r1 1200 [2006, 9999) r1 ‘ON1’ [2000, 2003) r1 ‘ON1’ [2003, 2004) r1 ‘ON2’ [2004, 2006) r1 ‘ON1’ [2006, 9999) Shape r1 [2000, 2003) r1 [2003, 2004) r1 [2004, 2006) r1 [2006, 9999) DATA REDUNDANCY + DATA SPLITTING + TEMPORAL DATA ANOMALIES 5 A Quick Review - The Handling of Temporal Data Attribute timestamping Area 1000 r1 800 [2000, 2003) [2003, 2004) [2006, 9999) 900 [2000, 2003) [2003, 2004) r2 1100 ‘ON1’ [2004, 2006) 1200 700 Owner ‘ON2’ ‘ON3’ [2004, 2006) [2006, 9999) ‘ON4’ Shape [2000, 2003) [2000, 2003) [2003, 2004) [2003, 2004) [2006, 9999) [2004, 2006) [2004, 2006) [2006, 9999) [2000, 2003) [2000, 2003) [2003, 2004) [2003, 2004) [2006, 9999) [2004, 2006) [2004, 2006) [2006, 9999) 6 Implementations for Temporal Support Temporal database system implementation Implementation from scratch Implementation on some existing DBMS: layered or integrated A layered approach An integrated approach Temporal Statement S Temporal Support Error Result Temporal Statement S Error Result Scanner Existing DBMS Metadata Management Parser Temporal Support Output Processor Code Generator Non-Temporal Statement S’ Existing DBMS 7 Implementations for Temporal Support An example of temporal database system implementation Our proposed temporal compatible object relational database system on Oracle 10g using the integrated approach Valid time Attribute timestamping A temporal transparency environment More at: Vo Thi Ngoc Chau, Suphamit Chittayasothorn, “A Temporal Compatible Object Relational Database System”, in Proceedings of SOUTHEASTCON, 2007. Vo Thi Ngoc Chau, Suphamit Chittayasothorn, “A Temporal Object Relational SQL Language with Attribute Timestamping in a Temporal Transparency Environment”, Data & Knowledge Engineering (2008), doi: 10.1016/j.datak.2008.06.008. 8 Our Proposed Temporal Compatible Object Relational Database System Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 9 Our Proposed Temporal Compatible Object Relational Database System Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 10 Our Proposed Temporal Compatible Object Relational Database System Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Non-temporal End-users Temporal SQL SQL wrapped in wrapped in Translation Translation Stored-Procedures Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 11 Our Proposed Temporal Compatible Object Relational Database System Upward Compatibility/Temporal Upward Compatibility Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 12 Our Proposed Temporal Compatible Object Relational Database System Sequenced Semantics Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 13 Our Proposed Temporal Compatible Object Relational Database System Non-sequenced Semantics Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 14 Our Proposed Temporal Compatible Object Relational Database System Interactive SQL Mode Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 15 Our Proposed Temporal Compatible Object Relational Database System Embedded SQL Mode Application Programs SQL with Temporal Types and Methods Temporal Application Programs Temporal SQL wrapped in Translation Stored-Procedures Non-temporal Application Programs SQL wrapped in Translation Stored-Procedures End-users SQL with Temporal Types and Methods Existing Application Programming Interface Temporal End-users Temporal SQL wrapped in Translation Stored-Procedures Non-temporal End-users SQL wrapped in Translation Stored-Procedures Existing End-User Interface Temporal Extension Non-Temporal ORDBMS Temporal Database 16 Implementations for Temporal Support Built-in temporal support in the SQL standards and DBMSs Part 7 SQL/Temporal was withdrawn. More at: ISO/IEC JTC 1/SC 32 N 0436, Rationale for the Withdrawal of Projects, 2000. Built-in date-time related data types and methods in many existing DBMSs MySQL, Oracle, MS SQL Server, IBM DB2, Informix, … 17 Implementations for Temporal Support Built-in temporal support in the SQL standards and DBMSs Valid time support with tuple timestamping in Oracle 10g/11g Workspace Manager Transaction time support in Oracle 10g/11g Flashback Query More at: Oracle Valid Time Support, Application developer’s guide – workspace manager, 10g release 1(10.1), No. B1082401. More at: Oracle Flashback Query, Application developer’s guide – fundamentals, 10g release 1(10.1), No. B10795-01. Transaction time support in MS SQL Server Immortal DB More at: D. Lomet, R. Barga, M. Mokbel, G. Shegalov, R. Wang, Y. Zhu, Transaction time support inside a database engine, in: Proc. ICDE Conference, (IEEE, 2006) 35-46. 18 Implementations for Temporal Support Built-in temporal support in Oracle Data types and methods related to date and time SQL standards Oracle 10g/11g Oracle valid time support Concepts The WM_PERIOD Data Type Data Definitions with Valid Time Support Data Querying with Valid Time Support Data Modifications with Valid Time Support 19 Data types and methods related to date and time in SQL standards Datetime data types DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT TIME ZONE, TIME WITH TIME ZONE, TIMESTAMP WITH TIME ZONE Fields: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, TIMEZONE_MINUTE Interval data type INTERVAL Fields: YEAR, MONTH, HOUR, MINUTE, SECOND 20 Data types and methods related to date and time in SQL standards Operations involving datetimes and intervals Arithmetic operators: -, +, *, / <overlaps predicate>: the operator OVERLAPS to determine if two chronological periods overlap in time <current date value function>: CURRENT_DATE <current time value function>: CURRENT_TIME <extract expression>: functions on a datetime or interval to return an exact numeric value representing the value of one component (field) of the datetime or interval … 21 Data types and methods related to date and time in Oracle 10g/11g Datetime data types DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE Fields: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, TIMEZONE_MINUTE, TIMEZONE_REGION, TIMEZONE_ABBR Interval data types INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND 22 Data types and methods related to date and time in Oracle 10g/11g Datetime arithmetic operations: +, -, *, / Datetime functions: ADD_MONTHS, CURRENT_DATE, CURRENT_TIMESTAMP, DBTIMEZONE, EXTRACT(datetime), FROM_TZ, LAST_DAY, LOCALTIMESTAMP, MONTHS_BETWEEN, NEW_TIME, NEXT_DAY, NUMTODSINTERVAL, NUMTOYMINTERVAL, ROUND(date), SESSIONTIMEZONE, SYS_EXTRACT_UTC, SYSDATE, SYSTIMESTAMP, TO_CHAR(datetime), TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, TO_YMINTERVAL, TRUNC(date), TZ_OFFSET 23 Oracle valid time support So-called Workspace Concepts The WM_PERIOD Data Type Data Definitions with Valid Time Support Data Querying with Valid Time Support Data Modifications with Valid Time Support 24 So-called Workspace A workspace is a virtual environment that one or more users can share to make changes to the data in the database. A workspace logically groups collections of new row versions from one or more version-enabled tables, and isolates these versions until they are explicitly merged with production data or discarded, thus providing maximum concurrency. Users in a workspace always see a transactionally consistent view of the entire database; that is, they see changes made in their current workspace plus the rest of the data in the database as it existed either when the workspace was created or when the workspace was most recently refreshed with changes from the parent workspace. The history option enables you to timestamp changes made to all rows in a version-enabled table and to save a copy of either all changes or only the most recent changes to each row. If you keep all changes (specifying the “without overwrite” history option) when versionenabling a table, you keep a persistent history of all changes made to all row versions, and enable users to go to any point in time to view the database as it existed from the perspective of that workspace. 25 Concepts The valid time support for version-enabled tables Valid time: the validity of the data Each record is valid only within the time range associated with the record. Each row contains an added column to hold the valid time period associated with the row. A valid time range is specified and queries, insert, update, and delete operations reflect and accommodate the valid time range. 26 Concepts Version-enabled tables All rows in a version-enabled table can support multiple versions of the data. The history option enables timestamp changes to be made to all rows in a version-enabled table and to save a copy of either all changes or only the most recent changes to each row. If all changes (specifying the “without overwrite” history option) are kept when a table is versionenabled, a persistent history of all changes made to all row versions is kept and users are enabled to go to any point in time to view the database as it existed from the perspective of that workspace. 27 Concepts Version-enabled tables Referential integrity and uniqueness constraints DDL operations related to version-enabled tables Enclosed by the BeginDDL procedure and the CommitDDL or RollbackDDL procedure DDL statements on <table_name>_LTS instead of <table_name> Querying is carried out as usual. Modification operations (insert, delete, and update) on version-enabled tables Carried out as usual 28 Examples 29 Examples 30 Examples 31 The WM_PERIOD Data Type Used to specify a valid time range for the session or workspace, and for a row in a version-enabled table Each instance of the WM_PERIOD data type is a closed-open period of time [validFrom, validTill) where validTill is excluded. 32 The WM_PERIOD Data Type Operators on instances of the WM_PERIOD data type The relationship checking operators (input: period1, period2; output: 1=TRUE, 0 = FALSE) WM_OVERLAPS, WM_CONTAINS, WM_MEETS, WM_EQUALS, WM_LESSTHAN, WM_GREATERTHAN The set operators (input: period1, period2; output: period3) WM_INTERSECTION, WM_LDIFF, WM_RDIFF 33 Data Definitions with Valid Time Support Enable valid time support when versionenabling a table Add valid time support to an existing version-enabled table Constraints on version-enabled tables that have valid time support 34 Data Definitions with Valid Time Support Enable valid time support when version-enabling a table 35 Data Definitions with Valid Time Support Add valid time support to an existing version-enabled table 36 Data Definitions with Valid Time Support Constraints on version-enabled tables that have valid time support Referential integrity constraints If both with valid time support, an insert or update operation on the referring table will fail if the valid time associated with the new value at the referring column is not within the valid time associated with the value at the referred column of the referred table. If either or both with no valid time support, valid time periods are ignored in enforcing the constraint. Unique constraints Given an insert or update operation on a version-enabled table with valid time support and a unique constraint on one or many columns If the existing and inserted rows have the same value combination at these constrained columns, their WM_VALID values do not overlap. 37 Data Definitions with Valid Time Support Constraints on version-enabled tables that have valid time support Referential integrity constraints For example, given two DEPARTMENTS and EMPLOYEES tables, the DEPARTMENTS.MANAGER_ID column is a foreign key referencing the EMPLOYEES.EMPLOYEE_ID column. Consider an insert or update operation with a new DEPARTMENTS.MANAGER_ID value. The operation will fail if the DEPARTMENTS.WM_VALID value is not within the range of the EMPLOYEES.WM_VALID value for the employee who is being made the department manager. 38 Data Definitions with Valid Time Support Constraints on version-enabled tables that have valid time support Unique constraints For example, given an EMPLOYEES table with a unique constraint on an EMPLOYEE_ID column. Consider an insert or update operation with a new EMPLOYEE_ID value. If the new EMPLOYEE_ID value is the same as an existing EMPLOYEE_ID value, the operation will fail if the WM_VALID values of the existing and inserted rows overlap. 39 Data Querying with Valid Time Support Invoke the SetValidTime or SetValidTimeFilterOn procedure with some valid time range Do not invoke or invoke the SetValidTime procedure with no parameters Rows with the valid time that overlap the specified valid time range are taken into account in the evaluation of a query. Rows that are valid at the current time are considered in the evaluation of a query. Operators are explicitly specified on the WM_VALID column of version-enabled tables with the valid time support the non-sequenced manner 40 Data Querying with Valid Time Support 41 Data Querying with Valid Time Support 42 Data Querying with Valid Time Support 43 Data Querying with Valid Time Support 44 Data Modifications with Valid Time Support INSERT statements Specify a valid time period for a new row the value at the WM_VALID column If NULL for the WM_VALID column, the session valid time period or [NOW, NULL) is used no concept of “temporal upward compatibility”. Primary key and referential integrity constraints are checked. 45 Data Modifications with Valid Time Support INSERT statements 46 Data Modifications with Valid Time Support DELETE statements A sequenced delete operation deletes the portion of a row that falls within the session valid time range. No concept of a non-sequenced delete operation If a valid time was not set, all rows satisfying the WHERE condition if any are considered for the delete operation incorrect description If a valid time was not set, all rows that are valid at the current time are considered correct from a test 47 Data Modifications with Valid Time Support DELETE statements – before deletion 48 Data Modifications with Valid Time Support DELETE statements – a sequenced deletion 49 Data Modifications with Valid Time Support DELETE statements – after a sequenced deletion 50 Data Modifications with Valid Time Support DELETE statements – test a non-sequenced deletion 51 Data Modifications with Valid Time Support UPDATE statements A sequenced update operation: no change is specified to the WM_VALID column in the UPDATE statement The WM_VALID.ValidTill value for an updated row is changed to the ValidFrom timestamp of the current session valid time range, and a new row is created in which the WM_VALID period reflects the current session valid time range. only for the overlapping case; this is not generally applied to other cases (disjoint, within, equal, contains) A non-sequenced update operation: a change is explicitly specified to the WM_VALID column in the UPDATE statement and the rows that are valid at the current time are deleted. 52 Data Modifications with Valid Time Support UPDATE statements 53 Data Modifications with Valid Time Support UPDATE statements 54 Data Modifications with Valid Time Support UPDATE statements 55 Data Modifications with Valid Time Support UPDATE statements 56 Data Modifications with Valid Time Support UPDATE statements 57 New Oracle Database 12c temporal support Examples from “Oracle® Database Development Guide, 12c Release 1 (12.1), E17620-11” empno last_name start_time end_time 58 New Oracle Database 12c temporal support empno last_name start_time end_time 100 Ames 01-Jan-10 30-Jun-11 101 Burton 01-Jan-11 30-Jun-11 102 Chen 01-Jan-12 59 New Oracle Database 12c temporal support empno last_name start_time end_time 100 Ames 01-Jan-10 30-Jun-11 101 Burton 01-Jan-11 30-Jun-11 102 Chen 01-Jan-12 60 empno last_name start_time end_time 100 Ames 01-Jan-10 30-Jun-11 101 Burton 01-Jan-11 30-Jun-11 102 Chen 01-Jan-12 61 Summary on Oracle’s Support Tuple timestamping Each row is associated with a period of time. Constraints: Referential integrity constraints, Uniqueness constraints No sequenced queries Sequenced deletions Operators on the WM_VALID column are explicitly invoked by users. Entire rows with the WM_VALID value that overlaps the session valid time range are deleted. A sequenced deletion should affect only the one in common, the others should be kept as-is. More new Oracle Database 12c temporal support at: www.oracle.com 62 Temporal Data Aggregation Given a valid-time table - Employee Name REFC Salary ValidTime Gender D_birth vtValue Ed E2 vtStart vtEnd 2/1/1982 1/1/1988 M Di John 1/1/1988 1/1/1982 1/1/1962 12/31/9999 E5 M 1/1/1978 12/31/9999 Jack 5/15/1950 12/31/9999 White 12/1/1960 10/1/1960 1/1/1978 Johnson 12/31/9999 vtEnd vtStart vtEnd 20 2/1/1982 6/1/1982 30 6/1/1982 2/1/1985 D1 2/1/1982 2/1/1987 2/1/1985 2/1/1987 4/1/1987 12/31/9999 D2 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 40 8/1/1984 9/1/1986 D1 1/1/1982 12/31/9999 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 55 1/1/1989 12/31/9999 D4 1/1/1984 12/31/9999 40 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984 50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984 40 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 45 1/1/1989 12/31/9999 40 F M F ValidTime vtValue vtStart 7/1/1955 12/31/9999 E3 E4 ValidTime vtValue E1 Edward Dept 1/1/1962 5/15/1950 12/1/1960 63 Temporal Data Aggregation Given a valid-time table - Employee Name REFC ValidTime Gender D_birth vtValue Ed vtStart vtEnd 2/1/1982 1/1/1988 a temporal composition M 1/1/1988 12/31/9999 Di John 1/1/1982 1/1/1962 12/31/9999 E4 E5 M Johnson 1/1/1978 12/31/9999 Jack 5/15/1950 12/31/9999 White 12/1/1960 10/1/1960 1/1/1978 E3 12/31/9999 2/1/1982 6/1/1982 30 6/1/1982 2/1/1985 2/1/1985 2/1/1987 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 40 8/1/1984 9/1/1986 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 55 1/1/1989 12/31/9999 40 1/1/1980 50 40 F M F vtEnd 20 7/1/1955 ValidTime vtValue vtStart time-dependent component E2 ValidTime vtValue E1 Edward an attribute Dept history Salary vtStart vtEnd D1 2/1/1982 2/1/1987 D2 4/1/1987time 12/31/9999 component D1 1/1/1982 12/31/9999 D4 1/1/1984 12/31/9999 1/1/1984 D4 1/1/1980 1/1/1984 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984 40 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 45 1/1/1989 12/31/9999 1/1/1962 5/15/1950 12/1/1960 64 Temporal Data Aggregation Challenge yourselves with temporal data aggregation How to provide built-in temporal support for users? How to make use of available built-in support from DBMS? Let’s try with different existing DBMSs! 65 List the highest salary at each department at present. Name REFC Salary ValidTime Gender D_birth vtValue Ed E2 vtStart vtEnd 2/1/1982 1/1/1988 M Di John 1/1/1988 1/1/1982 1/1/1962 12/31/9999 E5 M 1/1/1978 12/31/9999 Jack 5/15/1950 12/31/9999 White 12/1/1960 vtEnd D1 2/1/1982 2/1/1987 vtEnd 20 2/1/1982 6/1/1982 30 6/1/1982 2/1/1985 2/1/1985 2/1/1987 4/1/1987 12/31/9999 D2 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 40 8/1/1984 9/1/1986 D1 1/1/1982 12/31/9999 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 55 1/1/1989 12/31/9999 D4 1/1/1984 12/31/9999 40 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984 50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984 SAL_MAX 40 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 7/1/1955 10/1/1960 1/1/1978 Johnson vtStart 40 F M DEPT_IDF 12/31/9999 ValidTime vtValue vtStart 12/31/9999 E3 E4 ValidTime vtValue E1 Edward Dept 1/1/1962 5/15/1950 12/1/1960 D2 40 45 D1 50 D4 55 D3 50 1/1/1989 12/31/9999 66 List the number of employees whose names are always different from ‘John’ and their highest salaries along time Name REFC Salary ValidTime Gender D_birth vtValue Ed E2 vtStart vtEnd 2/1/1982 1/1/1988 M Di John 1/1/1988 1/1/1982 1/1/1962 12/31/9999 E5 10/1/1960 1/1/1978 M Johnson 1/1/1978 12/31/9999 Jack 5/15/1950 12/31/9999 White 12/1/1960 12/31/9999 EMP# 4 vtStart vtEnd vtStart vtEnd 20 2/1/1982 6/1/1982 30 6/1/1982 2/1/1985 D1 2/1/1982 2/1/1987 2/1/1985 2/1/1987 4/1/1987 12/31/9999 D2 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 40 8/1/1984 9/1/1986 D1 1/1/1982 12/31/9999 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 55 1/1/1989 12/31/9999 D4 1/1/1984 12/31/9999 40 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984 50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984 40 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 45 ValidTime 1/1/1989 12/31/9999 40 F 1/1/1962 M ValidTime vtValue 7/1/1955 12/31/9999 E3 E4 ValidTime vtValue E1 Edward Dept 5/15/1950 F 12/1/1960 vtValue SAL_MAX vtStart vtEnd 40 1/1/1980 1/1/1984 50 1/1/1984 12/31/9999 67 For each gender, list the number of employees and the highest salary that employees have been paid along time. Name REFC Salary ValidTime Gender D_birth vtValue Ed E2 vtStart vtEnd 2/1/1982 1/1/1988 M Di John 1/1/1988 1/1/1982 12/31/9999 1/1/1962 F 1/1/1978 Jack 5/15/1950 10/1/1960 1/1/1978 M Johnson EMP# M White 12/1/1960 M 12/31/9999 F 3 20 2/1/1982 6/1/1982 30 6/1/1982 2/1/1985 2/1/1985 2/1/1987 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 40 8/1/1984 9/1/1986 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 55 1/1/1989 SAL_MAX 12/31/9999 40 5/15/1950 vtValue50 12/1/1960 50 55 F 2 vtStart vtEnd D1 2/1/1982 2/1/1987 D2 4/1/1987 12/31/9999 D1 1/1/1982 12/31/9999 D4 1/1/1984 12/31/9999 D4 1/1/1980 1/1/1984 vtEnd D3 1/1/1984 12/31/9999 D3 1/1/1980 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 1/1/1962 12/31/9999 Gender 12/31/9999 vtEnd 40 40 E5 ValidTime vtValue vtStart 7/1/1955 12/31/9999 E3 E4 ValidTime vtValue E1 Edward Dept 30 40 45 1/1/1980 1/1/1984 ValidTime 1/1/1984 12/31/9999 1/1/1980 1/1/1984 1/1/1984 1/1/1989 1/1/1989 12/31/9999 vtStart 1/1/1980 1/1/1984 1/1/1989 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 40 1/1/1984 9/1/1986 50 9/1/1986 12/31/9999 68 List the highest salary at each department with less than 20 employees along time. Salary REFC … Dept ValidTime vtValue ValidTime vtEnd … 30 vtStart vtEnd vtStart vtEnd 1/1/1982 8/1/1984 1/1/1982 12/31/9999 40 8/1/1984 9/1/1986 2/1/1982 2/1/1987 6/1/1982 2/1/1985 D1 2/1/1985 2/1/1987 E2 E3 E4 E5 … … … … D2 vtEnd 30 2/1/1982 6/1/1982 40 ValidTime vtValue vtStart D1 E1 DEPT_ID vtValue vtStart 20 SAL_MAX 50 9/1/1986 12/31/9999 4/1/1987 12/31/9999 4/1/1987 12/31/9999 D2 4/1/1987 12/31/9999 40 4/1/1987 12/31/9999 30 1/1/1982 8/1/1984 D4 1/1/1980 1/1/1984 40 1/1/1980 1/1/1984 40 8/1/1984 9/1/1986 45 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 55 1/1/1989 12/31/9999 30 1/1/1980 1/1/1984 50 9/1/1986 12/31/9999 45 1/1/1984 1/1/1989 D1 D4 1/1/1982 12/31/9999 55 1/1/1989 12/31/9999 40 1/1/1980 1/1/1984 50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999 30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984 40 1/1/1984 1/1/1989 D4 1/1/1984 12/31/9999 45 1/1/1989 12/31/9999 D4 D3 1/1/1980 1/1/1984 D3 1/1/1984 12/31/9999 50 1/1/1984 12/31/9999 1/1/1984 12/31/9999 1/1/1980 1/1/1984 Grouped by a temporal column 69 Temporal Data Mining What is Data Mining? Frequent Pattern Mining 70 What is Data Mining? Obama campaign’s secret strategy – 2012 Knowing your customers Predict final status of undergrad students Predict heart disease Car classification Market analysis … 71 Data Mining Information/ Knowledge What is Data Mining? 72 What is Data Mining? Data mining – searching for knowledge (interesting patterns) from large amounts of data. More at: Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining: Concepts and Techniques”, Third Edition, Morgan Kaufmann Publishers, 2012. 73 What is Data Mining? Frequent pattern mining is a mining task Data mining as a step in the process of knowledge discovery 74 Frequent Pattern Mining Application domains Market basket analysis Predicting stock prices in the future Analyzing temporal data encountered in Electronic Health Record systems Predicting seismic trends 75 Frequent Pattern Mining Frequent patterns The patterns that occur frequently in data. Milk and bread are frequently bought together in grocery stores [milk, bread] is a frequent pattern. A sequence of buying a PC, followed by a camera and then a memory card frequently occurred It is a frequent sequential pattern. 76 Example of point-based data Example of interval-based data Example of time series data A time series - sequence of numbers collected at regular intervals over a period of Are they temporal data? time. 77 Frequent Pattern Mining Database Frequent pattern On point-based sequential databases On interval-based sequential databases On time series databases On databases of multiple time series 78 Frequent Pattern Mining More at: R. Agrawal, R. Srikant. Mining sequential patterns. In Proc. ICDE, 1995. I. Batal, D. Fradkin, J. Harrison, F. Mörchen, M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In Proc. KDD, pp.280-288, 2012. I. Batyrshin, L. Sheremetov, R. Herrera-Avelar. Perception based patterns in time series data mining. Studies in Computational Intelligence (SCI) 36 (2007) 85–118. M. Lin, S. Lee. Fast discovery of sequential patterns through memory indexing and database partitioning. Journal of Information Science and Engineering 21 (2005) 109–128. F. Mörchen, A. Ultsch. Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Disc 15 (2007) 181-215. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, M-C. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th International Conference on Data Engineering (ICDE), 2001. 79 Frequent Pattern Mining More at: P. Papapetrou, G. Kollios, S. Sclaroff. Discovering frequent arrangements of temporal intervals. In Proc. ICDM, 2005. E. Winarko, J. F. Roddick. ARMADA – An algorithm for discovering richer relative temporal association rules from interval – based data. Data & Knowledge Engineering 63 (2007) 76-90. S-Y. Wu, Y-L. Chen. Mining nonambiguous temporal patterns for intervalbased events. IEEE Transactions on Knowledge and Data Engineering 19 (2007) 742–758. Q. Yang, X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5 (4) (2006) 597–604. R. Sadasivam, K. Duraiswamy, Efficient approach to discover interval-based sequential patterns, Journal of Computer Science, 9 (2): 225-234, 2013. T. B. T. Phan, T. N. C Vo, T. A. Duong – An efficient interval-based approach to mining frequent patterns in a time series database. In Proc. MIWAI’13, 80 Conclusion Time is ubiquitous. Consider temporal data from semantics to management to implementation Check if you can find some application domain where time does not exist Let’s get started with some advanced DBMS where your temporal data can be supported the most And now, a very very very … large amount of temporal data gathered over the time What should we do next to make the most of such existing data? Let’s try something with temporal data analysis and mining 81 82