Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
® IBM Software Group An In-depth Look at Active Data Features - Constraints and Triggers – in DB2 LUW Petrus Chan, IBM Toronto Lab, [email protected] © 2009 IBM Corporation IBM Software Group | DB2 information management software Agenda Active Data Features Overview of SQL Processing in DB2 Constraints Triggers Semantic Query Optimization in DB2 2 IBM Software Group | DB2 information management software Active Data Features a mechanism whereby an SQL statement can invoke an action that is not explicitly specified by the SQL statement Can be used to enable business rules enforcement Event – Condition – Action when Event if Database event • Update, delete, insert Search condition on Condition • Transition values Database or External Action • then Action e.g. alert, reject, fix up, log 3 IBM Software Group | DB2 information management software Categories of Active Data Features Constraints Declarative rules that ensure the validity of data values Triggers Automatic procedural actions that are triggered by update events on a given table Materialized Query Table (MQT) maintenance Maintenance of materialized query results stored in a table, which can be used by DB2 Optimizer to significantly improve performance of complex queries. 4 IBM Software Group | DB2 information management software Access Plan: ----------Total Cost: 63.9241 Query Degree: 1 Overview of SQL processing in DB2 Rows RETURN ( 1) Cost I/O | 25 GRPBY ( 2) 63.4613 2 | 829.44 MSJOIN ( 3) 55.1569 2 /---+----\ 144 5.76 TBSCAN FILTER ( 4) ( 7) 23.8489 23.6013 1 1 | | 144 144 SORT TBSCAN ( 5) ( 8) 23.7975 23.6013 1 1 | | 144 144 TBSCAN SORT ( 6) ( 9) 19.4513 23.5498 1 1 | | 144 144 TABLE: PETRUS3 TBSCAN T2 ( 10) Q1 19.4513 1 | 144 TABLE: PETRUS3 T1 Q2 Query SELECT t1.c1, sum (t2.c2) FROM T1, T2 WHERE t1.c1 = t2.c1 GROUP BY t1.c1; 5 IBM Software Group | DB2 information management software Agenda Active Data Features Overview of SQL Processing in DB2 Constraints Triggers Semantic Query Optimization in DB2 6 IBM Software Group | DB2 information management software Constraints in DB2 Domain constraints – data type, nullability Key constraints – primary, unique Check constraints – domain restriction, primitive business rules Referential integrity constraints – foreign key Generated columns – derived attributes Functional dependencies – determined by Symmetric views – with check option 7 IBM Software Group | DB2 information management software Primary/Unique Key Constraints Purpose Enforcing the primary or candidate key semantics of an attribute Specification CREATE TABLE PERSON(SSN INT NOT NULL UNIQUE, …); CREATE TABLE PERSON(SSN INT NOT NULL, …, PRIMARY KEY(SSN)); ALTER TABLE PERSON ADD CONSTRAINT ssn_uk UNIQUE(SSN); Implementation Enforced by DB2 through the creation of a unique index on the columns of the key. Semantically enforced as an end of statement constraint, through a mechanism called deferred unique checking. UPDATE T SET C1 = C1+1 can be executed without raising an error. 8 IBM Software Group | DB2 information management software Check Constraints Purpose Can be used to enforce/restrict the domain of an attribute Can implement simple business rules based on values of other columns of the same data row, e.g., CHECK(ship_date >= order_date) Specification CREATE TABLE SALES_2004(ITEM_ID CHAR(16), SALE_DATE DATE CHECK(SALE_DATE BETWEEN 01/01/2004 AND 12/31/2004), …); ALTER TABLE EMPLOYEE ADD CONSTRAINT check_sex CHECK( SEX IN (‘M’, ‘F’)); Implementation Additional predicate corresponding to the check constraint is compiled into the query graph of any INSERT, UPDATE or DELETE statements modifying a given table Evaluated using the “after” image for each row, error is raised when check condition is not satisfied 9 IBM Software Group | DB2 information management software Check constraints - example db2 => ALTER TABLE EMPLOYEE ADD CONSTRAINT chk_sex CHECK(sex IN ('M','F')) db2 => INSERT INTO EMPLOYEE VALUES('000070', 'EVA', 'D', 'PULASKI', …) 2) FILTER: (Filter) ... Predicate Text: -------------(1 = CASE WHEN NOT(Q4.$C0 IN ('M', 'F')) THEN RAISE_ERROR(-545, ‘PETRUS.EMPLOYEE.CHK_SEX.') ELSE 0 END) Access Plan: ----------Total Cost: 13.3724 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.04 FILTER ( 2) 13.3724 1 | 1 INSERT ( 3) 13.2904 1 /---+--\ 1 77 TBSCAN TABLE: ADMIN ( 4) EMPLOYEE 0.0048 0 | 1 TABFNC: SYSIBM GENROW 10 IBM Software Group | DB2 information management software Referential Integrity Purpose Defines an inclusion relationship between a foreign key (set of columns in the referencing/child table) and a primary key (set of columns in the referenced/parent table) Simply put, any non-NULL values of the foreign key must be present as a value of the corresponding primary/unique key. Specification (can be provided during CREATE or ALTER table) The definition of the referencing table (child table) contains a clause of the form: FOREIGN KEY fk_name(fk_col1, fk_col2, …) REFERENCES parent_table(pk_col1, pk_col2, …) <cascading options> where <cascading options> can be one or more of: – ON DELETE [ NO ACTION | RESTRICT | CASCADE | SET NULL ] – ON UPDATE [ NO ACTION | RESTRICT ] 11 IBM Software Group | DB2 information management software Referential Integrity (cont’d) Implementation UPDATE/INSERT of child table’s foreign key columns: Additional processing compiled into query graph to: – Look up primary key values in the parent table. – Raise error if not found. UPDATE/DELETE of parent. Additional processing compiled into query graph to: Depends on ON UPDATE and ON DELETE cascading options: – RESTRICT – raise error if old value of the parent columns still has corresponding children – SET NULL – set foreign key columns in child table to NULL – CASCADE – delete the children rows – NO ACTION (default) – similar to RESTRICT, but enforced after other constraints 12 IBM Software Group | DB2 information management software Referential integrity - example Access Plan: ----------Total Cost: Query Degree: Rows RETURN ( 1) Cost I/O | 0.1232 FILTER ( 2) 77.8767 5.08 /--------+--------\ db2 => ALTER TABLE EMPLOYEE ADD CONSTRAINT dept_fk FOREIGN KEY (workdept) REFERENCES DEPARTMENT(deptno) ON DELETE SET NULL ON UPDATE RESTRICT db2 => UPDATE EMPLOYEE SET workdept=‘871’ WHERE empno=‘000010’ 2) FILTER: (Filter) ... Predicate Text: -------------(1 = CASE WHEN (Q4.$C0 IS NOT NULL AND NOT EXISTS(SELECT $RID$ FROM ADMIN.DEPARTMENT AS Q5 WHERE ('871' = Q5.DEPTNO))) THEN RAISE_ERROR(-530, 'ADMIN.EMPLOYEE.DEPT_FK.') ELSE 2 END) 77.8767 1 3.08 UPDATE ( 3) 75.557 5.08 /---+--\ 3.08 TBSCAN 77 TABLE: ADMIN 1 IXSCAN ( 5) 1.97248 0 | 47 INDEX: ADMIN ( 4) 34.747 2 | 77 TABLE: ADMIN EMPLOYEE DEPT_PK EMPLOYEE 13 IBM Software Group | DB2 information management software Rows RETURN nd ( 1) Cost I/O Cascaded | Update 3.08 db2 => DELETE FROM DEPARTMENT WHERE deptno='E11' UPDATE ( 2) 90.8877 6.08 /---+--\ 3.08 77 3) NLJOIN: (Nested Loop Join) NLJOIN TABLE: ADMIN … ( 3) EMPLOYEE Predicate Text: 50.005 -------------3 Table Scan! (Q5.$C0 = Q6.WORKDEPT) /---------+--------\ 1 3.08 Old deptno value DELETE TBSCAN ( 4) ( 6) 15.2581 34.747 1 2 Note the table scan when /---+---\ | 1 47 77 executing the look up of the IXSCAN TABLE: ADMIN TABLE: ADMIN secondary table. ( 5) DEPARTMENT EMPLOYEE 1.97248 Create index on foreign key 0 | columns to avoid this 47 INDEX: ADMIN Referential integrity – 2 example DEPT_PK 14 IBM Software Group | DB2 information management software Generated columns Purpose Deriving values of a column based on other columns on the same row Specification (CREATE or ALTER TABLE) CREATE TABLE Persons(first_name VARCHAR(64), last_name VARCHAR(64), last_name_upper GENERATED ALWAYS AS UPPER(last_name)); Implementation The SQL compiler will substitute the column value or the DEFAULT keyword with the generated column expression whenever necessary 15 IBM Software Group | DB2 information management software Generated columns - Example Can be used to implement “index on expression”: create index idx1 on table_name (expr(col1)) Instead: create generated column col2 as expr(col1) create index on generated column c2 NOT ALLOWED A typical MDC example: CREATE TABLE ORDERS( O_ORDKEY INT NOT NULL, O_CUSTKEY INT NOT NULL, O_PRICE FLOAT NOT NULL, O_ORDERDATE DATE NOT NULL, OG_ORDERMONTH GENERATED ALWAYS AS (INTEGER(O_ORDERDATE)/100), PRIMARY KEY o_key (O_ORDERKEY)) ORGANIZE BY ( OG_ORDERMONTH ); The default value for the generated column will be replaced by the generated column expression. INSERT INTO ORDERS VALUES(151, 32, 16.97, CURRENT DATE, DEFAULT); ---> INSERT INTO ORDERS VALUES(151, 32, 16.97, CURRENT DATE, INTEGER(CURRENT DATE)/100); 16 IBM Software Group | DB2 information management software Functional dependencies Purpose To capture the functional dependency between columns of the same table, other than the implicit dependency on the key Can occur due to denormalization Typical example: attribute_code attribute_description Specification CREATE TABLE PRODUCT(prod_key CHAR(15) NOT NULL, smu_code VARCHAR(6) NOT NULL, smu_desc VARCHAR(35), ..., prod_desc VARCHAR(30), CONSTRAINT FD1 CHECK (smu_desc DETERMINED BY smu_code) NOT ENFORCED ENABLE QUERY OPTIMIZATION; ALTER TABLE product ADD CHECK ( prod_desc DETERMINED BY prod_key) NOT ENFORCED; Implementation Purely informational, currently cannot be created as enforced Currently exploited during MQT matching, possible other uses in the future 17 IBM Software Group | DB2 information management software Symmetric views – Views with CHECK OPTION Purpose Often, views are created to control user’s access to base table. Regular views do not restrict modifications that do not conform to view definition. Example: Create view view1 as (SELECT * FROM EMPLOYEE WHERE WORKDEPT=‘E21’) User can insert into view1 with workdept = E22 But subsequent select * from view1 cannot see the row. View with CHECK OPTION (or Symmetric View) can be used to restrict the modification. Specification CREATE VIEW DEPT_E21_EMPLOYEES AS (SELECT * FROM EMPLOYEE WHERE WORKDEPT=‘E21’) WITH CHECK OPTION; Implementation Similar to a check constraint, any predicate in the where clause of the view are applied against the new values of the rows during an update through the view. 18 IBM Software Group | DB2 information management software Symmetric views – Views with CHECK OPTION (2) LOCAL vs. CASCADED CHECK OPTION View over another view LOCAL – must satisfy definitions of current view and all underlying views that also have a check option CASCADED (default) – must satisfy definitions of current view and all underlying views, regardless of whether they have a check option or not. create view VIEW1 as Select empno, empname, deptno from EMP where dept10 = 10; Create view VIEW2 as Select empno, empname, deptno from VIEW1 where empno > 20 Predicates checked Y=LOCAL Y=CASCADED view1 none none view2 empno > 20 dept10 = 10 Empno > 20 WITH Y CHECK OPTION; 19 IBM Software Group | DB2 information management software Costs vs. Benefits of Constraints Benefit of (1) Preventing application errors vs. Cost of data integrity enforcement / maintenance (2) Enabling optimizations Informational Constraints: if the integrity is enforced outside of the database (e.g. ETL processes), informational constraints can provide the benefit without the additional cost db2 => ALTER TABLE DEPARTMENT ALTER FOREIGN KEY dept_fk NOT ENFORCED 20 IBM Software Group | DB2 information management software Enforced vs. Informational Constraints By default: the constraints are enforced by the database engine, frequently there is cost associated with enforcement. all constraints are exploited during query optimization Check and Referential Integrity constraints can be ENFORCED or NOT Functional dependencies purely informational (cannot be enforced) If an informational constraint is enabled for query optimization, then the user must ensure that informational constraints are satisfied at all times, otherwise the query results could become incorrect 21 IBM Software Group | DB2 information management software Agenda Active Data Features Overview of SQL Processing in DB2 Constraints Triggers Semantic Query Optimization in DB2 22 IBM Software Group | DB2 information management software Introduction to Triggers Automatic procedural actions that are triggered by update events on a given table Trigger types: BEFORE, AFTER and … INSTEAD OF Granularity: trigger execution can be either per row, or per statement Procedural SQL extensions can be used to implement the trigger logic 23 IBM Software Group | DB2 information management software For each row triggers - BEFORE Usually used to validate or fix up data before modification Typical application: look at the new value of one or more columns of the current row, compare it to other rows in the table (possibly using aggregation) or reference against another table, if value doesn’t conform, fix it up or raise an error. No database modifications are allowed inside before triggers Simple before triggers are “inlined” into the main data flow, more complex ones will result in a join below the U/D/I operation 24 IBM Software Group | DB2 information management software BEFORE trigger - example Sample business rule: Disallow salary increases of over 25% CREATE TRIGGER VERIFY_RAISE NO CASCADE BEFORE UPDATE OF SALARY ON EMPLOYEE REFERENCING NEW AS N OLD AS O FOR EACH ROW MODE DB2SQL WHEN(N.SALARY > O.SALARY*1.25) SIGNAL SQLSTATE ‘75000’ (‘Salary increase over limit’); Rows RETURN ( 1) Cost I/O | 77 UPDATE ( 2) 1015.6 79 /---+---\ 77 77 FILTER TABLE: ADMIN ( 3) EMPLOYEE 25.8637 2 | 77 FILTER ( 4) 25.8115 2 This trigger is | “inlined” here. 77 Typically you would TBSCAN expect the trigger ( 5) body joined to the 25.7725 main flow of data. 2 | 77 TABLE: ADMIN EMPLOYEE 25 IBM Software Group | DB2 information management software For each row triggers - AFTER Can be used for: database assertion validation based on the after state Example: archiving, versioning and replication of data Executed once per each row modification Can access the row transition variables NEW and OLD, corresponding to the before and after image of the row Body can contain arbitrarily complex procedural-like logic inside a BEGIN ATOMIC … END block Starting in V82 DB2 offers the ability to issue a CALL statement within a trigger body to invoke a stored procedure. Packaged trigger (in development) 26 IBM Software Group | DB2 information management software AFTER row level trigger - example Implementing versioning with an after for each row trigger: CREATE TRIGGER AFTTR1 AFTER UPDATE OF WORKDEPT ON EMPLOYEE REFERENCING OLD AS O FOR EACH ROW MODE DB2SQL BEGIN ATOMIC DECLARE LAST_NEWEST DATE; SET LAST_NEWEST = (SELECT END_DATE FROM EMP_DEPT_HISTORY WHERE EMPNO=O.EMPNO ORDER BY 1 DESC FETCH FIRST 1 ROW ONLY); INSERT INTO EMP_DEPT_HISTORY VALUES(O.EMPNO, O.WORKDEPT, LAST_NEWEST, CURRENT DATE); END% UPDATE EMPLOYEE SET WORDEPT=‘D21’ WHERE EMPNO=‘000230’% 27 IBM Software Group | DB2 information management software For each statement triggers - AFTER Executed once per triggering statement, even if no rows affected by the update/delete/insert Can reference the set of old and new row values via the transition tables 28 IBM Software Group | DB2 information management software AFTER statement level trigger - example Auditing of table data changes CREATE TRIGGER EMP_AUDIT AFTER UPDATE ON EMPLOYEE REFERENCING OLD_TABLE AS OLDT NEW_TABLE AS NEWT FOR EACH STATEMENT MODE DB2SQL INSERT INTO EMP_AUDIT SELECT O.EMPNO, O.LASTNAME, N.LASTNAME, O.WORKDEPT, N.WORKDEPT, O.PHONENO, N.PHONENO, O.JOB, N.JOB, O.EDLEVEL, N.EDLEVEL, O.SALARY, N.SALARY, O.BONUS, N.BONUS, O.COMM, N.COMM FROM OLDT O, NEWT N WHERE O.EMPNO=N.EMPNO; UPDATE EMPLOYEE SET WORKDEPT=‘D21’ WHERE EMPNO=‘000230’; 29 IBM Software Group | DB2 information management software Triggers on Views – INSTEAD OF Some views are considered not updatable by DB2. Example: view with joins Instead Of Trigger on a view allows you to “intercept” the update/delete/insert issued against the view and replace it with operations on the underlying tables. Currently only row level INSTEAD OF triggers supported in DB2 Execute at the same time as after triggers CREATE VIEW EMPLOYEE_WITH_DEPT_INFO AS (SELECT E.*, D.DEPTNAME, D.MGRNO, D.LOCATION FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO); CREATE TRIGGER INSTTR1 INSTEAD OF UPDATE ON EMPLOYEE_WITH_DEPT_INFO REFERENCING NEW AS N FOR EACH ROW MODE DB2SQL BEGIN ATOMIC UPDATE EMPLOYEE E SET (FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE, JOB, EDLEVEL, SEX, BIRTHDATE, SALARY, BONUS, COMM) = (N.FIRSTNME, N.MIDINIT, N.LASTNAME, N.WORKDEPT, N.PHONENO, N.HIREDATE, N.JOB, N.EDLEVEL, N.SEX, N.BIRTHDATE, N.SALARY, N.BONUS, N.COMM) WHERE N.EMPNO = E.EMPNO;-UPDATE DEPARTMENT D SET (DEPTNAME, MGRNO, LOCATION) = (N.DEPTNAME, N.MGRNO, N.LOCATION) WHERE N.WORKDEPT = D.DEPTNO;-END; 30 IBM Software Group | DB2 information management software INSTEAD OF trigger - example 31 IBM Software Group | DB2 information management software Interaction of Triggers and Constraints 32 IBM Software Group | DB2 information management software Database utilities and constraints and triggers Implications of triggers and constraints for utilities: Bulk LOAD utility: Data type, nullability and key constraints enforced during load. Referential integrity and check constraints not verified during load – table placed in check pending state after load – constraints must be verified in a separate step using the SET INTEGRITY statement. If load was incremental (append) the constraint checking will be incremental as well Generated columns can be generate during load, or by SET INTEGRITY Triggers not fired! 33 IBM Software Group | DB2 information management software Agenda Active Data Features Overview of SQL Processing in DB2 Constraints Triggers Semantic Query Optimization in DB2 34 IBM Software Group | DB2 information management software Overview of SQL processing in DB2 35 IBM Software Group | DB2 information management software Semantic Query Optimization in DB2 Semantic query optimization (SQO) = the use of integrity constraints to optimize the evaluation of a query Mostly during the query rewrite and query optimization phases elimination of unnecessary processing based on additional semantic information provided by the integrity constraints 36 IBM Software Group | DB2 information management software Exploiting Check Constraints Check constraint and generated column serve as derived predicate. The predicate will be exploited by DB2 Query Rewrite Theorem Prover. One common usage is to prove a FALSE predicate for the elimination of a subgraph or a branch Example: Check Constraint: alter table t1 add constraint chk1 check (c1 > 5) Query: select * from t1 where c1 < 5 Query Rewrite will prove that no row in T1 will satisfy the 2 predicates – will replace it with a FALSE predicate (1=0). The subgraph that evaluates the scan of T1 will be changed to: Select * from values(1) where 1=0 37 IBM Software Group | DB2 information management software Exploiting Check Constraints: Another Example create view SALES (SALES_2006 UNION ALL SALES_2007 UNION ALL SALES_2008) Check constraints: SALES_2006 - check (Year = 2006) SALES_2007 - check (Year = 2007) SALES_2008 - check (Year = 2008) Query: select * from SALES where Year = 2007 Query Rewrite will: (1) Compile in the check constraint predicate (2) push down the predicate Year=2007 through union all operations (3) derives a FALSE predicate for branch 2006 and 2008 (4) eliminates the two branches altogether SELECT … WHERE YEAR=2007 PUSH DOWN (YEAR=2007) THRU UNION ALL PREDICATE DERIVED FROM THE CHECK CONSTRAINT UNION ALL SELECT (YEAR = 2006) SELECT (YEAR = 2007) SELECT (YEAR = 2008) SALES_2006 SALES_2007 SALES_2008 38 IBM Software Group | DB2 information management software Exploiting Referential Integrity Redundant join elimination CREATE VIEW EMPLOYEE_WITH_DEPT_INFO AS (SELECT E.*, D.DEPTNAME, D.MGRNO, D.LOCATION FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO); select empno, lastname, workdept from employee_with_dept_info; Join with the DEPARTMENT table eliminated entirely from the query Need to compensate with workdept is not null SELECT SELECT JOIN EMPLOYEE EMPLOYEE DEPARTMENT 39 IBM Software Group | DB2 information management software Exploiting Referential Integrity (2) Group by pushdown through join select deptno, deptname, sum(salary) from employee, department where workdept=deptno group by deptno, deptname GROUP BY pushed down below the join (along the child table) GROUPBY JOIN EMPLOYEE DEPARTMENT => reduces number of rows participating in the join GROUPBY JOIN DEPARTMENT EMPLOYEE 40 IBM Software Group | DB2 information management software Exploiting Generated Columns CREATE TABLE ORDERS( O_ORDKEY INT NOT NULL, O_CUSTKEY INT NOT NULL, O_PRICE FLOAT NOT NULL, O_ORDERDATE DATE NOT NULL, ... OG_ORDERMONTH GENERATED ALWAYS AS (INTEGER(O_ORDERDATE)/100), PRIMARY KEY o_key (O_ORDERKEY)) ORGANIZE BY ( OG_ORDERMONTH ); SELECT * FROM ORDERS WHERE INTEGER(O_ORDERDATE)/100 = 200901; ---> SELECT FROM ORDERS WHERE OG_ORDERMONTH = 200901; The generated expression in the query will be replaced by the generated column. The predicate on generated column is used to exploit an index 41 IBM Software Group | DB2 information management software Exploiting NOT NULL constraint, Uniqueness and Functional Dependency NOT NULL Example: MQT matching (more on this later) Whenever possible declare columns as NOT NULL, do not leave nullable by default. Uniqueness can be exploited to remove DISTINCT processing, explicit and implicit, to avoid unnecessary SORT operators: Example 1: create table emp (empno int not null unique, empname char(30)); select distinct * from emp ; Example 2: MERGE statement duplicate checking Functional Dependency Example: MQT matching (more on this later) 42 IBM Software Group | DB2 information management software Summary Data integrity constraints implement business rules and application logic within the database server If data integrity is maintained outside of the database, informational constraints can be defined Constraints provide valuable information to Query Rewrite for query optimization Triggers offer more flexibility by allowing procedural logic 43 IBM Software Group | DB2 information management software Selected Further Reading Roberta Cochrane, Hamid Pirahesh, Nelson Mendonça Mattos: Integrating Triggers and Declarative Constraints in SQL Database Sytems. VLDB 1996: 567-578 Hamid Pirahesh, T. Y. Cliff Leung, Waqar Hasan: A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. ICDE 1997: 391-400 Qi Cheng, Jarek Gryz, Fred Koo, T. Y. Cliff Leung, Linqi Liu, Xiaoyan Qian, K. Bernhard Schiefer: Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. VLDB 1999: 687-698 Serge Rielau: INSTEAD OF Triggers - All Views are Updatable!, http://www.ibm.com/developerworks/db2/library/techarticle/0210rielau/ 0210rielau.html, 2002 44 IBM Software Group | DB2 information management software Thank You 45