Download Creating a Type 2 Slowly Changing Dimension

Document related concepts

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Oracle Database wikipedia , lookup

Transcript
More Relational Dimensional Modeling
Copyright © 2009, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to:
• Explain how OWB handles incremental data refreshes
• Detect and process errors when loading dimensional data
into hierarchies (orphan management)
• Use cube-organized materialized views
• Explain how OWB handles slowly changing dimensions
• Define a Type 2 slowly changing dimension
7-2
Copyright © 2009, Oracle. All rights reserved.
Lesson Agenda
•
•
•
•
7-3
Managing incremental data refreshes
Detecting and processing errors when loading dimensional
data into hierarchies (orphan management)
Using cube-organized materialized views
Managing slowly changing dimensions
Copyright © 2009, Oracle. All rights reserved.
Initial Versus Incremental Loads
•
Initial loads:
– Trial sample
– Usability sample
– Historical data load
•
Incremental loads:
– Facts
– Dimensions
7-4
Copyright © 2009, Oracle. All rights reserved.
Two Categories of DW Updating:
Data and Metadata
Refreshing the fact data
Add
new
data
Purge
old
data
Updating metadata structures
PRODUCTS
dimension
CHANNELS
dimension
SALES
fact
Our focus in this lesson
CUSTOMERS
dimension
TIMES
dimension
PROMOTIONS
dimension
7-5
Copyright © 2009, Oracle. All rights reserved.
Preserving History While
Updating Fact Data
Fact table
Removing oldest
month’s data
Maintaining
three years of
historic data
Appending new
month’s data to
end of table
7-6
Copyright © 2009, Oracle. All rights reserved.
Two Ways to Refresh the Fact Table
•
Loading through a direct connection to the operational
OLTP system—this method:
– Has availability issues
– Involves a time-stamp comparison operation to differentiate
new data
•
Loading offline from a flat file containing
pre-extracted new data only
7-7
Copyright © 2009, Oracle. All rights reserved.
Capturing Changed Data for Refresh
•
•
•
•
Capture new fact data.
Capture changed dimension data.
Determine the method for the capture of each.
Methods:
–
–
–
–
–
–
7-8
Wholesale data replacement
Comparison of database instances
Time stamping
Database triggers
Database log
Hybrid techniques
Copyright © 2009, Oracle. All rights reserved.
Wholesale Data Replacement
•
•
•
Expensive, if a large reload
Limited historical data, if most old data is dropped
Time period replacement
Operational
databases
T1
7-9
T2
T3
Copyright © 2009, Oracle. All rights reserved.
Comparison of Database Instances
•
•
Simple to perform, but expensive in time
and processing
Delta file:
– Changes to operational data since last refresh
– Used by various techniques
Yesterday’s
operational
database
Today’s
operational
database
7 - 10
Database
comparison
Delta file holds
changed data.
Copyright © 2009, Oracle. All rights reserved.
Time and Date Stamping
•
•
•
Fast scanning for records that has been changed since
last extraction
Date Updated field
No detection of deleted data
Operational
data
7 - 11
Delta file holds
changed data.
Copyright © 2009, Oracle. All rights reserved.
Database Triggers
•
•
•
Changed data intersected at the server level
Extra I/O required
Maintenance overhead
Operational
server
(DBMS)
Operational
data
Trigger
Trigger
Trigger
Triggers on server
7 - 12
Copyright © 2009, Oracle. All rights reserved.
Delta file holds
changed data.
Using a Database Log
•
•
•
Contains before and after images
Requires system checkpoint
Common technique
Operational
server
(DBMS)
Operational
data
Log analysis
and
data extraction
Log
7 - 13
Copyright © 2009, Oracle. All rights reserved.
Delta file holds
changed data.
Verdict
•
•
•
7 - 14
Consider each method on merit.
Consider a hybrid approach if one approach is not suitable.
Consider current technical, existing operational, and
current application issues.
Copyright © 2009, Oracle. All rights reserved.
Applying the Changes to Data
You have a choice of techniques:
• Overwrite a record
• Add a record
• Add a field
• Maintain history
• Add version numbers
7 - 15
Copyright © 2009, Oracle. All rights reserved.
Overwriting a Record
•
•
•
•
As taught earlier, this is referred to as a type 1 slowly
changing dimension.
Implementation is easy.
History is lost.
This technique is not recommended.
Customer ID
John Doe
Single
...................................................................,
...............................................................,....
Customer ID
John Doe
Married
......................................................................
......................................................................
7 - 16
Copyright © 2009, Oracle. All rights reserved.
Adding a New Record
•
•
•
•
•
As taught earlier, this is an example of a type 2 slowly
changing dimension.
History is preserved; dimensions grow.
Time constraints are required.
A generalized key is created.
Metadata tracks the use of keys.
Eff_from
Before
1 Customer ID John Doe Single
After
7 - 17
Eff_to
1-Feb-41
Eff_from
Eff_to
1 Customer ID John Doe Single
1-Feb-41
31-Dec-95
42 Customer ID John Doe Married
1-Jan-96
Copyright © 2009, Oracle. All rights reserved.
Adding a Current Field
•
•
•
•
7 - 18
As taught earlier, this is an example of a type 3 slowly
changing dimension.
Some history is maintained.
Intermediate values are lost.
This method is enhanced by adding an Effective Date field.
Customer ID
John Doe
Single
Customer ID
John Doe
Single Married 01-JAN-96
Copyright © 2009, Oracle. All rights reserved.
Maintaining History
History tables:
• Can maintain a one-to-many relationship between the
tables
• Always retain the current record
• Enable reference to record history consistently
HIST_CUST
CHANNELS
CUSTOMERS
SALES
PRODUCTS
TIMES
PROMOTIONS
7 - 19
Copyright © 2009, Oracle. All rights reserved.
History Preserved
•
•
•
•
History enables realistic analysis.
History retains context of data.
History provides for realistic historical analysis.
Model must be able to:
– Reflect business changes
– Maintain context between fact and dimension data
– Retain sufficient data to relate old to new
7 - 20
Copyright © 2009, Oracle. All rights reserved.
Dimensions and Cubes Automatically Handle
Update Via MERGE
•
The good news: Dimension and cube operators handle
data update operations automatically!
– They are set up for doing an update/insert (MERGE)
operation by default.
•
The other news: Incremental update of target relational
tables (not associated with dimensions and cubes)
requires manipulation of their loading type.
– Now we will examine a situation for incremental update of a
relational table.
– More good news: The useful advice in the following slides on
incremental update of relational tables is not well
documented elsewhere!
7 - 21
Copyright © 2009, Oracle. All rights reserved.
Three Refresh Scenarios for Refreshing
Target Tables
The source and the target have the same primary keys:
VENDOR
PK: Acct_Num
SUPPLIER
PK: Acct_Num
The source and the target have different primary keys:
RETIRED EMPLOYEES
EMPLOYEES
PK: Emp_ID
PK: Emp_Num
Emp_Num
Emp_Name
The target uses a sequence-generated synthetic key:
STG_CUSTOMERS
PK: Customer_Src_ID
7 - 22
PEOPLE
PK: Person_WH_ID
UK: Person_SRC_ID
Effective_From_Date
Copyright © 2009, Oracle. All rights reserved.
Target Uses a Sequence-Generated Key
•
How can you match when the target uses a sequencegenerated key?
Sequence values
unknown
STG_CUSTOMERS
PK: Customer_SRC_ID
•
7 - 23
PEOPLE
PK: Person_WH_ID
UK: Person_SRC_ID
Effective_From_Date
Match on the unique natural key (Person_SRC_ID,
Effective_To_Date).
Copyright © 2009, Oracle. All rights reserved.
Change “Match by Constraint”
to “No Constraints”
7 - 24
Copyright © 2009, Oracle. All rights reserved.
Setting Attribute Properties
for Synthetic Keys
The synthetic key with sequence generator has an unknown value.
Therefore, specify the natural key for update matching, rather than
this synthetic primary key.
7 - 25
Copyright © 2009, Oracle. All rights reserved.
Setting Loading Properties
The following attribute settings are useful
whenever a surrogate/ synthetic key (for
instance, a sequence operator) is used on
a target in a map:
• Load column when inserting row
• Load column when updating row
• Match column when updating row
• Update: Operation (eight available
target conditions are shown on the
next slide)
• Match column when deleting row
7 - 26
Copyright © 2009, Oracle. All rights reserved.
Update Operation Conditions
If source value = 5 and target value = 10, then each condition results in
the following target values:
Condition
7 - 28
Meaning
Target Value
=
Target = source
Target = 5
+=
Target = source + target
Target = 15
-=
Target = target - source
Target = 5
=-
Target = source – target
Target = negative 5
=||
Target = target || source
Target = 105
||=
Target = source || target
Target = 510
*=
Target = target * source
Target = 50
/=
Target = target / source
Target = 2
Copyright © 2009, Oracle. All rights reserved.
Choosing the DML Load Type
•
•
•
•
•
7 - 29
INSERT (the default)
UPDATE
INSERT/UPDATE
UPDATE/INSERT
DELETE
Copyright © 2009, Oracle. All rights reserved.
Choosing the DML Load Type
•
•
•
•
7 - 30
NONE
TRUNCATE/INSERT
DELETE/INSERT
CHECK/INSERT
Copyright © 2009, Oracle. All rights reserved.
Specifying an Update Target Condition
7 - 31
Copyright © 2009, Oracle. All rights reserved.
CDC Template Mappings: Another Method for
Updating Changed Data
1. Choose a Change Data Capture (CDC) mechanism.
– Trigger-based (Oracle, IBM, Microsoft)
– Log-based (Oracle and IBM)
2.
3.
4.
5.
7 - 32
Choose the table upon which to perform CDC.
Start the capture process.
Define the subscribers to receive the changed data.
Define mappings to consume the changes.
Copyright © 2009, Oracle. All rights reserved.
Quiz
Which of the following statements are false?
a. There are two categories of data warehouse and data mart
update tasks: changes to the data inside the dimension
and changes to the metadata of the dimension.
b. The synthetic key with sequence generator has an
unknown value; therefore, use this synthetic primary key
for update matching, rather than the natural key.
c. Type 1 SCD is easy to implement, just overwriting a record
with changes; however, history is lost.
7 - 33
Copyright © 2009, Oracle. All rights reserved.
Lesson Agenda
•
•
•
•
7 - 34
Managing incremental data refreshes
Detecting and processing errors when loading dimensional
data into hierarchies (orphan management)
Using cube-organized materialized views
Managing slowly changing dimensions
Copyright © 2009, Oracle. All rights reserved.
The Challenge of Managing Orphans
Three items with null
parent key values
Three items with invalid
parent key values
7 - 35
Copyright © 2009, Oracle. All rights reserved.
How OWB Manages Orphans
Specify rules for orphan values within
a dimension and between the cube
and the dimensions.
Specify rules to apply during the
loading and removal of dimensional
data.
Specify different actions for records
with null parent and invalid parent
values.
Now play the viewlet
on managing orphans!
7 - 36
Copyright © 2009, Oracle. All rights reserved.
Lesson Agenda
•
•
•
•
7 - 37
How OWB handles incremental data refreshes
Detecting and processing errors when loading dimensional
data into hierarchies (orphan management)
Using cube-organized materialized views
Managing slowly changing dimensions
Copyright © 2009, Oracle. All rights reserved.
ROLAP Implementation of Dimensional Objects
•
ROLAP implementation of dimensional objects can be
classified as follows:
– ROLAP implementation
—
The dimensional object and its data are stored in a relational
form in the database, and the CWM2 metadata for the
dimensional object is stored in the OLAP catalog. This enables
you to query the dimensional object from OLAP tools.
– ROLAP with Cube MVs implementation
—
7 - 38
The dimensional object and its data are stored in a relational
form in the database. Additionally, cube-organized materialized
views are created in an analytic workspace.
Copyright © 2009, Oracle. All rights reserved.
Support for Cube-Organized Materialized Views
•
OWB 11g Release 2 supports
OLAP cube storage in cubeorganized materialized views.
It provides out-of-the-box summary management
capabilities for facts stored in a relational data warehouse,
and summarized in cube-organized materialized views.
•
ROLAP
Implementing
table
Dimension
Bind
Synchronize to
repository object,
data stored in
relational table
ROLAP w/MVs
Cube-organized
materialized view
- Relational fact table
- Summaries stored
in AW
7 - 39
Copyright © 2009, Oracle. All rights reserved.
Schema
table
AW
Configuring the Cube
We see that:
 Enable MV Refresh is true
 Query Rewrite is enabled
 Constraints are trusted
 Refresh Mode is force
 Refresh On is Demand
Demand means that after the mapping has been executed to load the fact table,
a manual refresh must be performed.
7 - 40
Copyright © 2009, Oracle. All rights reserved.
Storage Type: ROLAP with CUBE MVs
A relational fact table with cubeorganized materialized views for
summary management
•
7 - 41
ROLAP with cube MVs can be used if:
– PL/SQL Generation Mode is default
and Location is Oracle 11g R1 or
higher. Or,
– PL/SQL Generation Mode is Oracle
11g R1 or higher
Copyright © 2009, Oracle. All rights reserved.
Using Compressed Cube Technology with Sparse
Dimensions
The dimensions
have been
tagged as
sparse.
Cube compression
has been specified
for the sparse
dimensions.
 The partition cube check box
applies to MOLAP and
ROLAP with MV cubes.
 This is Analytic Workspace–
related partitioning.
 This cube is partitioned
by calendar year.
7 - 42
Copyright © 2009, Oracle. All rights reserved.
View the Code After Deploying Dimensions and
Cube
7 - 43
Copyright © 2009, Oracle. All rights reserved.
Examining the Cube in Analytic Workspace
Manager
7 - 44
Copyright © 2009, Oracle. All rights reserved.
Use SQL Developer to Test Queries with and
Without Query Rewrite
7 - 45
Copyright © 2009, Oracle. All rights reserved.
Execution Plan Without Query Rewrite
In this query without query
rewrite, notice that the cost is
4174, and that a full table scan
took place.
7 - 46
Copyright © 2009, Oracle. All rights reserved.
Execution Plan with Query Rewrite
In this query with query rewrite,
notice that the cost is reduced
from 4174 to 121, and that the
summaries are rewritten to the
cube MVs.
7 - 47
Copyright © 2009, Oracle. All rights reserved.
Lesson Agenda
•
•
•
•
7 - 48
How OWB handles incremental data refreshes
Detecting and processing errors when loading dimensional
data into hierarchies (orphan management)
Using cube-organized materialized views
Managing slowly changing dimensions
Copyright © 2009, Oracle. All rights reserved.
What Is a Slowly Changing Dimension?
A slowly changing dimension (SCD) is a dimension that stores
and manages both current and historical data over time in a
data warehouse.
7 - 49
Copyright © 2009, Oracle. All rights reserved.
Types of Slowly Changing Dimensions
There are three types of slowly changing dimensions.
• Type 1 overwrites old values.
• Type 2 creates another dimension record.
• Type 3 creates a current value field.
7 - 50
Copyright © 2009, Oracle. All rights reserved.
Type 1 SCD: Does Not Store History
Type 1 overwrites old values.
Old record
ID
3
Customer ID Customer Name
Marital Status
Steve
1125
Single
New record
ID
3
7 - 51
Customer ID Customer Name
1125
Marital Status
Steve
Copyright © 2009, Oracle. All rights reserved.
Married
Type 2 SCD: Preserves Complete History
Type 2 stores complete change history in a new record.
Before
ID
Customer
ID
3
1125
Customer
Marital
Name
Status
Steve
Single
Effective
Date
Expiration
Date
04-04-1999
NULL
Open/current record
After
ID
Customer Customer Marital
Effective
Expiration
Date
Date
ID
Name
Status
3
1125
Steve
Single
8
1125
Steve
Married 01-14-2001 NULL
04-04-1999 01-13-2001
Open/current record
7 - 52
Copyright © 2009, Oracle. All rights reserved.
Closed record
Type 3 SCD: Stores Only
the Previous Value
Type 3 stores current and previous version of a selected attribute.
ID Customer
ID
7 - 53
Customer
Marital
Name
Status
Previous
Marital
Status
Effective
Date
3
1125
Steve
Married
Single
01-14-2001
3
1125
Steve
Widower
Married
10-30-2004
Copyright © 2009, Oracle. All rights reserved.
Creating a Type 2 SCD by Using
the Dimension Editor
When creating a type 2 SCD by using the editor:
• Identify the attributes that store the effective time and
expiration time, which are:
– EFFECTIVE_DATE
– EXPIRATION_DATE
•
7 - 54
Identify the attribute that triggers history saving.
Copyright © 2009, Oracle. All rights reserved.
Applying the Two Change-Tracking Attributes to
the Lowest Level
Applicable, because CHANNEL
is the lowest level
7 - 55
Not applicable, because CLASS
is not the lowest level
Copyright © 2009, Oracle. All rights reserved.
Creating a Type 2 SCD
7 - 56
Copyright © 2009, Oracle. All rights reserved.
Creating a Type 2 SCD
7 - 57
Copyright © 2009, Oracle. All rights reserved.
Binding the Attribute to Its Implementation Table
Dimension
7 - 58
Copyright © 2009, Oracle. All rights reserved.
Implementation
table
Synchronize Mapping That Loads Type 2 SCD
7 - 59
Copyright © 2009, Oracle. All rights reserved.
Creating a Type 2 SCD by Using the Wizard
7 - 60
Copyright © 2009, Oracle. All rights reserved.
Dimension Operator in a Mapping
7 - 61
Copyright © 2009, Oracle. All rights reserved.
Creating a Type 3 SCD
7 - 62
Copyright © 2009, Oracle. All rights reserved.
Attributes in a Type 3 SCD
7 - 63
Copyright © 2009, Oracle. All rights reserved.
Quiz
Identify the type a, b, and c slowly changing dimension:
a. This type creates another dimension record.
b. This type overwrites old values.
c. This type creates a current value field.
7 - 64
Copyright © 2009, Oracle. All rights reserved.
Summary
In this lesson, you should have learned how to:
• Explain how OWB handles slowly changing dimensions
• Define a type 2 slowly changing dimension
• Use cube-organized materialized views
• Detect and process errors when loading dimensional data
into hierarchies (orphan management)
• Explain how OWB handles incremental data refreshes
7 - 65
Copyright © 2009, Oracle. All rights reserved.
Practice 7-1: Creating a Type 2 Slowly Changing
Dimension
This practice covers creating a type 2 slowly changing
dimension.
7 - 66
Copyright © 2009, Oracle. All rights reserved.