Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
More Relational Dimensional Modeling Copyright © 2009, Oracle. All rights reserved. Objectives After completing this lesson, you should be able to: • Explain how OWB handles incremental data refreshes • Detect and process errors when loading dimensional data into hierarchies (orphan management) • Use cube-organized materialized views • Explain how OWB handles slowly changing dimensions • Define a Type 2 slowly changing dimension 7-2 Copyright © 2009, Oracle. All rights reserved. Lesson Agenda • • • • 7-3 Managing incremental data refreshes Detecting and processing errors when loading dimensional data into hierarchies (orphan management) Using cube-organized materialized views Managing slowly changing dimensions Copyright © 2009, Oracle. All rights reserved. Initial Versus Incremental Loads • Initial loads: – Trial sample – Usability sample – Historical data load • Incremental loads: – Facts – Dimensions 7-4 Copyright © 2009, Oracle. All rights reserved. Two Categories of DW Updating: Data and Metadata Refreshing the fact data Add new data Purge old data Updating metadata structures PRODUCTS dimension CHANNELS dimension SALES fact Our focus in this lesson CUSTOMERS dimension TIMES dimension PROMOTIONS dimension 7-5 Copyright © 2009, Oracle. All rights reserved. Preserving History While Updating Fact Data Fact table Removing oldest month’s data Maintaining three years of historic data Appending new month’s data to end of table 7-6 Copyright © 2009, Oracle. All rights reserved. Two Ways to Refresh the Fact Table • Loading through a direct connection to the operational OLTP system—this method: – Has availability issues – Involves a time-stamp comparison operation to differentiate new data • Loading offline from a flat file containing pre-extracted new data only 7-7 Copyright © 2009, Oracle. All rights reserved. Capturing Changed Data for Refresh • • • • Capture new fact data. Capture changed dimension data. Determine the method for the capture of each. Methods: – – – – – – 7-8 Wholesale data replacement Comparison of database instances Time stamping Database triggers Database log Hybrid techniques Copyright © 2009, Oracle. All rights reserved. Wholesale Data Replacement • • • Expensive, if a large reload Limited historical data, if most old data is dropped Time period replacement Operational databases T1 7-9 T2 T3 Copyright © 2009, Oracle. All rights reserved. Comparison of Database Instances • • Simple to perform, but expensive in time and processing Delta file: – Changes to operational data since last refresh – Used by various techniques Yesterday’s operational database Today’s operational database 7 - 10 Database comparison Delta file holds changed data. Copyright © 2009, Oracle. All rights reserved. Time and Date Stamping • • • Fast scanning for records that has been changed since last extraction Date Updated field No detection of deleted data Operational data 7 - 11 Delta file holds changed data. Copyright © 2009, Oracle. All rights reserved. Database Triggers • • • Changed data intersected at the server level Extra I/O required Maintenance overhead Operational server (DBMS) Operational data Trigger Trigger Trigger Triggers on server 7 - 12 Copyright © 2009, Oracle. All rights reserved. Delta file holds changed data. Using a Database Log • • • Contains before and after images Requires system checkpoint Common technique Operational server (DBMS) Operational data Log analysis and data extraction Log 7 - 13 Copyright © 2009, Oracle. All rights reserved. Delta file holds changed data. Verdict • • • 7 - 14 Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issues. Copyright © 2009, Oracle. All rights reserved. Applying the Changes to Data You have a choice of techniques: • Overwrite a record • Add a record • Add a field • Maintain history • Add version numbers 7 - 15 Copyright © 2009, Oracle. All rights reserved. Overwriting a Record • • • • As taught earlier, this is referred to as a type 1 slowly changing dimension. Implementation is easy. History is lost. This technique is not recommended. Customer ID John Doe Single ..................................................................., ...............................................................,.... Customer ID John Doe Married ...................................................................... ...................................................................... 7 - 16 Copyright © 2009, Oracle. All rights reserved. Adding a New Record • • • • • As taught earlier, this is an example of a type 2 slowly changing dimension. History is preserved; dimensions grow. Time constraints are required. A generalized key is created. Metadata tracks the use of keys. Eff_from Before 1 Customer ID John Doe Single After 7 - 17 Eff_to 1-Feb-41 Eff_from Eff_to 1 Customer ID John Doe Single 1-Feb-41 31-Dec-95 42 Customer ID John Doe Married 1-Jan-96 Copyright © 2009, Oracle. All rights reserved. Adding a Current Field • • • • 7 - 18 As taught earlier, this is an example of a type 3 slowly changing dimension. Some history is maintained. Intermediate values are lost. This method is enhanced by adding an Effective Date field. Customer ID John Doe Single Customer ID John Doe Single Married 01-JAN-96 Copyright © 2009, Oracle. All rights reserved. Maintaining History History tables: • Can maintain a one-to-many relationship between the tables • Always retain the current record • Enable reference to record history consistently HIST_CUST CHANNELS CUSTOMERS SALES PRODUCTS TIMES PROMOTIONS 7 - 19 Copyright © 2009, Oracle. All rights reserved. History Preserved • • • • History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. Model must be able to: – Reflect business changes – Maintain context between fact and dimension data – Retain sufficient data to relate old to new 7 - 20 Copyright © 2009, Oracle. All rights reserved. Dimensions and Cubes Automatically Handle Update Via MERGE • The good news: Dimension and cube operators handle data update operations automatically! – They are set up for doing an update/insert (MERGE) operation by default. • The other news: Incremental update of target relational tables (not associated with dimensions and cubes) requires manipulation of their loading type. – Now we will examine a situation for incremental update of a relational table. – More good news: The useful advice in the following slides on incremental update of relational tables is not well documented elsewhere! 7 - 21 Copyright © 2009, Oracle. All rights reserved. Three Refresh Scenarios for Refreshing Target Tables The source and the target have the same primary keys: VENDOR PK: Acct_Num SUPPLIER PK: Acct_Num The source and the target have different primary keys: RETIRED EMPLOYEES EMPLOYEES PK: Emp_ID PK: Emp_Num Emp_Num Emp_Name The target uses a sequence-generated synthetic key: STG_CUSTOMERS PK: Customer_Src_ID 7 - 22 PEOPLE PK: Person_WH_ID UK: Person_SRC_ID Effective_From_Date Copyright © 2009, Oracle. All rights reserved. Target Uses a Sequence-Generated Key • How can you match when the target uses a sequencegenerated key? Sequence values unknown STG_CUSTOMERS PK: Customer_SRC_ID • 7 - 23 PEOPLE PK: Person_WH_ID UK: Person_SRC_ID Effective_From_Date Match on the unique natural key (Person_SRC_ID, Effective_To_Date). Copyright © 2009, Oracle. All rights reserved. Change “Match by Constraint” to “No Constraints” 7 - 24 Copyright © 2009, Oracle. All rights reserved. Setting Attribute Properties for Synthetic Keys The synthetic key with sequence generator has an unknown value. Therefore, specify the natural key for update matching, rather than this synthetic primary key. 7 - 25 Copyright © 2009, Oracle. All rights reserved. Setting Loading Properties The following attribute settings are useful whenever a surrogate/ synthetic key (for instance, a sequence operator) is used on a target in a map: • Load column when inserting row • Load column when updating row • Match column when updating row • Update: Operation (eight available target conditions are shown on the next slide) • Match column when deleting row 7 - 26 Copyright © 2009, Oracle. All rights reserved. Update Operation Conditions If source value = 5 and target value = 10, then each condition results in the following target values: Condition 7 - 28 Meaning Target Value = Target = source Target = 5 += Target = source + target Target = 15 -= Target = target - source Target = 5 =- Target = source – target Target = negative 5 =|| Target = target || source Target = 105 ||= Target = source || target Target = 510 *= Target = target * source Target = 50 /= Target = target / source Target = 2 Copyright © 2009, Oracle. All rights reserved. Choosing the DML Load Type • • • • • 7 - 29 INSERT (the default) UPDATE INSERT/UPDATE UPDATE/INSERT DELETE Copyright © 2009, Oracle. All rights reserved. Choosing the DML Load Type • • • • 7 - 30 NONE TRUNCATE/INSERT DELETE/INSERT CHECK/INSERT Copyright © 2009, Oracle. All rights reserved. Specifying an Update Target Condition 7 - 31 Copyright © 2009, Oracle. All rights reserved. CDC Template Mappings: Another Method for Updating Changed Data 1. Choose a Change Data Capture (CDC) mechanism. – Trigger-based (Oracle, IBM, Microsoft) – Log-based (Oracle and IBM) 2. 3. 4. 5. 7 - 32 Choose the table upon which to perform CDC. Start the capture process. Define the subscribers to receive the changed data. Define mappings to consume the changes. Copyright © 2009, Oracle. All rights reserved. Quiz Which of the following statements are false? a. There are two categories of data warehouse and data mart update tasks: changes to the data inside the dimension and changes to the metadata of the dimension. b. The synthetic key with sequence generator has an unknown value; therefore, use this synthetic primary key for update matching, rather than the natural key. c. Type 1 SCD is easy to implement, just overwriting a record with changes; however, history is lost. 7 - 33 Copyright © 2009, Oracle. All rights reserved. Lesson Agenda • • • • 7 - 34 Managing incremental data refreshes Detecting and processing errors when loading dimensional data into hierarchies (orphan management) Using cube-organized materialized views Managing slowly changing dimensions Copyright © 2009, Oracle. All rights reserved. The Challenge of Managing Orphans Three items with null parent key values Three items with invalid parent key values 7 - 35 Copyright © 2009, Oracle. All rights reserved. How OWB Manages Orphans Specify rules for orphan values within a dimension and between the cube and the dimensions. Specify rules to apply during the loading and removal of dimensional data. Specify different actions for records with null parent and invalid parent values. Now play the viewlet on managing orphans! 7 - 36 Copyright © 2009, Oracle. All rights reserved. Lesson Agenda • • • • 7 - 37 How OWB handles incremental data refreshes Detecting and processing errors when loading dimensional data into hierarchies (orphan management) Using cube-organized materialized views Managing slowly changing dimensions Copyright © 2009, Oracle. All rights reserved. ROLAP Implementation of Dimensional Objects • ROLAP implementation of dimensional objects can be classified as follows: – ROLAP implementation — The dimensional object and its data are stored in a relational form in the database, and the CWM2 metadata for the dimensional object is stored in the OLAP catalog. This enables you to query the dimensional object from OLAP tools. – ROLAP with Cube MVs implementation — 7 - 38 The dimensional object and its data are stored in a relational form in the database. Additionally, cube-organized materialized views are created in an analytic workspace. Copyright © 2009, Oracle. All rights reserved. Support for Cube-Organized Materialized Views • OWB 11g Release 2 supports OLAP cube storage in cubeorganized materialized views. It provides out-of-the-box summary management capabilities for facts stored in a relational data warehouse, and summarized in cube-organized materialized views. • ROLAP Implementing table Dimension Bind Synchronize to repository object, data stored in relational table ROLAP w/MVs Cube-organized materialized view - Relational fact table - Summaries stored in AW 7 - 39 Copyright © 2009, Oracle. All rights reserved. Schema table AW Configuring the Cube We see that: Enable MV Refresh is true Query Rewrite is enabled Constraints are trusted Refresh Mode is force Refresh On is Demand Demand means that after the mapping has been executed to load the fact table, a manual refresh must be performed. 7 - 40 Copyright © 2009, Oracle. All rights reserved. Storage Type: ROLAP with CUBE MVs A relational fact table with cubeorganized materialized views for summary management • 7 - 41 ROLAP with cube MVs can be used if: – PL/SQL Generation Mode is default and Location is Oracle 11g R1 or higher. Or, – PL/SQL Generation Mode is Oracle 11g R1 or higher Copyright © 2009, Oracle. All rights reserved. Using Compressed Cube Technology with Sparse Dimensions The dimensions have been tagged as sparse. Cube compression has been specified for the sparse dimensions. The partition cube check box applies to MOLAP and ROLAP with MV cubes. This is Analytic Workspace– related partitioning. This cube is partitioned by calendar year. 7 - 42 Copyright © 2009, Oracle. All rights reserved. View the Code After Deploying Dimensions and Cube 7 - 43 Copyright © 2009, Oracle. All rights reserved. Examining the Cube in Analytic Workspace Manager 7 - 44 Copyright © 2009, Oracle. All rights reserved. Use SQL Developer to Test Queries with and Without Query Rewrite 7 - 45 Copyright © 2009, Oracle. All rights reserved. Execution Plan Without Query Rewrite In this query without query rewrite, notice that the cost is 4174, and that a full table scan took place. 7 - 46 Copyright © 2009, Oracle. All rights reserved. Execution Plan with Query Rewrite In this query with query rewrite, notice that the cost is reduced from 4174 to 121, and that the summaries are rewritten to the cube MVs. 7 - 47 Copyright © 2009, Oracle. All rights reserved. Lesson Agenda • • • • 7 - 48 How OWB handles incremental data refreshes Detecting and processing errors when loading dimensional data into hierarchies (orphan management) Using cube-organized materialized views Managing slowly changing dimensions Copyright © 2009, Oracle. All rights reserved. What Is a Slowly Changing Dimension? A slowly changing dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. 7 - 49 Copyright © 2009, Oracle. All rights reserved. Types of Slowly Changing Dimensions There are three types of slowly changing dimensions. • Type 1 overwrites old values. • Type 2 creates another dimension record. • Type 3 creates a current value field. 7 - 50 Copyright © 2009, Oracle. All rights reserved. Type 1 SCD: Does Not Store History Type 1 overwrites old values. Old record ID 3 Customer ID Customer Name Marital Status Steve 1125 Single New record ID 3 7 - 51 Customer ID Customer Name 1125 Marital Status Steve Copyright © 2009, Oracle. All rights reserved. Married Type 2 SCD: Preserves Complete History Type 2 stores complete change history in a new record. Before ID Customer ID 3 1125 Customer Marital Name Status Steve Single Effective Date Expiration Date 04-04-1999 NULL Open/current record After ID Customer Customer Marital Effective Expiration Date Date ID Name Status 3 1125 Steve Single 8 1125 Steve Married 01-14-2001 NULL 04-04-1999 01-13-2001 Open/current record 7 - 52 Copyright © 2009, Oracle. All rights reserved. Closed record Type 3 SCD: Stores Only the Previous Value Type 3 stores current and previous version of a selected attribute. ID Customer ID 7 - 53 Customer Marital Name Status Previous Marital Status Effective Date 3 1125 Steve Married Single 01-14-2001 3 1125 Steve Widower Married 10-30-2004 Copyright © 2009, Oracle. All rights reserved. Creating a Type 2 SCD by Using the Dimension Editor When creating a type 2 SCD by using the editor: • Identify the attributes that store the effective time and expiration time, which are: – EFFECTIVE_DATE – EXPIRATION_DATE • 7 - 54 Identify the attribute that triggers history saving. Copyright © 2009, Oracle. All rights reserved. Applying the Two Change-Tracking Attributes to the Lowest Level Applicable, because CHANNEL is the lowest level 7 - 55 Not applicable, because CLASS is not the lowest level Copyright © 2009, Oracle. All rights reserved. Creating a Type 2 SCD 7 - 56 Copyright © 2009, Oracle. All rights reserved. Creating a Type 2 SCD 7 - 57 Copyright © 2009, Oracle. All rights reserved. Binding the Attribute to Its Implementation Table Dimension 7 - 58 Copyright © 2009, Oracle. All rights reserved. Implementation table Synchronize Mapping That Loads Type 2 SCD 7 - 59 Copyright © 2009, Oracle. All rights reserved. Creating a Type 2 SCD by Using the Wizard 7 - 60 Copyright © 2009, Oracle. All rights reserved. Dimension Operator in a Mapping 7 - 61 Copyright © 2009, Oracle. All rights reserved. Creating a Type 3 SCD 7 - 62 Copyright © 2009, Oracle. All rights reserved. Attributes in a Type 3 SCD 7 - 63 Copyright © 2009, Oracle. All rights reserved. Quiz Identify the type a, b, and c slowly changing dimension: a. This type creates another dimension record. b. This type overwrites old values. c. This type creates a current value field. 7 - 64 Copyright © 2009, Oracle. All rights reserved. Summary In this lesson, you should have learned how to: • Explain how OWB handles slowly changing dimensions • Define a type 2 slowly changing dimension • Use cube-organized materialized views • Detect and process errors when loading dimensional data into hierarchies (orphan management) • Explain how OWB handles incremental data refreshes 7 - 65 Copyright © 2009, Oracle. All rights reserved. Practice 7-1: Creating a Type 2 Slowly Changing Dimension This practice covers creating a type 2 slowly changing dimension. 7 - 66 Copyright © 2009, Oracle. All rights reserved.