Download Staging Area in near real-time DWH

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Oracle Database wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Solution for Staging Area in Near Real-Time DWH –
Efficient in Refresh and Easy to Operate
Technical White Paper
Mathias Zarick, Karol Hajdu
Senior Consultants
March-2011
While looking for a solution for near real-time data warehouse (DWH), the efficiency and
operational stability (reliability) is probably one of your primary technical concerns. This
applies both to the data transformation tasks and to the extraction & loading of staging
area as well. This article covers the challenges around the staging area. It presents a
technical solution which aims to solve both important concerns: the refresh process of
staging area is efficient and easy to operate. This solution is based on Oracle Data Guard
and transportable tablespaces.
Contents
1.
1.1
1.2
2.
2.1
2.2
3.
3.1
3.2
3.3
4.
4.1
4.2
4.3
5.
The role of a Staging Area in Data Warehouse ................................................................. 3
The Challenge called “short latency” ...................................................................................................... 3
Different solutions having different advantages ................................................................................... 3
Solution with Data Guard – the management perspective ............................................... 5
Benefits for Data Warehouses .................................................................................................................... 5
Which types of Data Warehouses will benefit most? ........................................................................ 5
Solution with Oracle Data Guard – the technical insight ................................................. 6
How it works? .................................................................................................................................................. 6
The Key Advantages ...................................................................................................................................... 9
Technical Prerequisites.............................................................................................................................. 10
Take a „Tour“ on Real-Life Example ................................................................................ 12
Real-life example ......................................................................................................................................... 12
Setup and Configuration ........................................................................................................................... 13
Operation ....................................................................................................................................................... 14
Solution extension: If data availability for operational reporting matters ...................... 17
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 2 / 19
1.
The role of a Staging Area in Data Warehouse
In data warehouse architectures, there are some common good practices concerning the
staging area:
1. Create a staging area. After being extracted from source systems, the data is loaded
into the staging area. The staging area serves as the input for transformation
processes.
2. During the extraction and load into the staging area, only minimal data
transformations are done: the tables in the staging area have the same structure as
the corresponding tables in the source system. This makes the ETL architecture
much more transparent.
Based on the staging area’s content, the transformation and integration processes will produce:
- snapshots of data, serving as input for DWH’s versioning
- sets of change events (transactions) to be loaded into the DWH
1.1
The Challenge called “short latency”
In many enterprises, the Data Warehouse is the place where operative data originating from
different systems are coming together and are integrated with analytical or dispositive data.
Step-by-step, many business users discovered the value of integrated data. They use the data
stored in the Data Warehouse to create reporting or analytical applications.
As the markets in many lines of business get more and more volatile, the business users are
not willing to wait several days or hours for the latest figures. They require a shortened latency
of the Data Warehouse: the need for near real-time data warehouse was born.
Integration tasks involve both hardware resources and time. Hence, the Data Warehouse
architects faced a new challenge: to find a trade-off between “get more speed (data latency)”
and “provide integrated and cleansed data”. Some of them decided to introduce additional
redundancy (by creating an Operational Data Store, having short latency, but less integration).
Some of them decided to provide short latency only for very narrow and well specified content:
they speak about real-time data warehouse content, rather than a real-time data warehouse.
Regardless on which approach the Data Warehouse architect has chosen, he or she needs to
have a Staging Area with short latency. This is the subject covered by this white paper.
1.2
Different solutions having different advantages
There are several technical approaches how data extraction and the loading of a staging area
can be implemented. The technical implementations differ basically in the following
characteristics:
- transferred data volumes required to refresh the staging area
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 3 / 19
-
degree of completeness for changes to be captured
performance impact on the source system (additional resource consumption)
impact on data and service availability of source system
total costs of ownership:
o licensing costs and development efforts
o operational complexity (efforts, reliability)
Concept / Implementation
Full-Extraction
Marker-Based Extraction
Journal-Based Extraction
Oracle Streams
Oracle Golden Gate
Data Volumes
to Transfer
Degree of
Completeness










Performance
Impact on
Source



/
/
Availability
Impact on
Source
Operational
Complexity








/
/
Table 1: Simple overview of refresh solutions for DWH’s staging area
Depending on database technology of the source system, some concepts can be excluded
right from the start, because they have very specific prerequisites about supported
technologies.
For more details about the refresh solutions for Staging Area, please refer to the book ”Data
Warehousing mit Oracle – Business Intelligence in der Praxis” [3], Chapter 3.4.
If the source system is Oracle, there is yet another technical solution to extract the data from
source systems and load it into staging area. This approach uses Oracle Data Guard, flashback
and transportable tablespaces.
This solution has the same advantages common for any other replication techniques like
Oracle Streams or Oracle GoldenGate:
- small data volumes to be transferred
- low impact on source system’s performance & availability1
- all types of changes on both data & structures are captured and transferred
However, there is one important difference:
This new solution has significantly lower operational complexity than Oracle Streams or
Oracle GoldenGate!
Concept / Implementation
New Solution using Data Guard and
transportable tablespaces
Data Volumes
to Transfer
Degree of
Completeness


Performance
Impact on
Source

Availability
Impact on
Source
Operational
Complexity


Table 2: New solution has significantly lower operational complexity than Streams or GoldenGate
This white paper explains the concepts and provides the most important implementation
details presented in form of a real-life example.
1
This solution has even less impact on source database than Oracle Streams or Oracle GoldenGate.
Reason: the supplemental logging is not required here.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 4 / 19
2. Solution with Data Guard – the management perspective
2.1
Benefits for Data Warehouses
Our experience shows that the solution described in this paper brings the following benefits for
Data Warehousing:
Benefit
Short latency of data stored
in DWH or ODS
(to near real-time)
Shorter time-to-market
for new ETL functionality
Easy and stable operation
2.2
How is this achieved?
The refresh process of Staging Area and/or Operational Data
Store (ODS) is very efficient.
It consumes small amount of HW resources.
It terminates in short elapsed time.
Solution enables that Staging Area contains full set of data
(not only the changed records). This makes the ETL
application more transparent. Introducing changes in ETL
applications is then less complex.
While refreshing the tables in Staging Area, the operational
complexity is delegated to standard and reliable Oracle
products and features. These features are easy to operate.
Which types of Data Warehouses will benefit most?
Extraction and sourcing from dedicated online transaction applications which are used to
manage complex relationships between customers, suppliers, accounts or delivery
components (applications like CRM or SCM2) can be very hard. The underlying database
schema of these applications is related with complex data models3. Companies using
dedicated CRM or SCM applications often have to manage the life cycle of several millions of
individual subjects (like customers, suppliers, contracts, product components, stock keeping
units, policies etc).
A Staging Area – or even an Operational Data Store (ODS) – if using the solution described in
this paper, takes most benefits if:
- The source system has huge data volume with complex relationships, having
relatively small rate of data changes.
- There are reports with short latency requirements. The Staging Area needs to
capture the changes made in online applications with a very short latency.
2
3
Supply Chain Management
a lot of relationships between the tables
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 5 / 19
3. Solution with Oracle Data Guard – the technical insight
3.1
How it works?
The solution presented in this article is based on Oracle Data Guard.
Data Guard technology maintains standby databases which are copies of primary databases.
A Data Guard’s standby database can also be used for refreshing the staging area in a data
warehouse.
The main idea is based on the Data Guard’s ability to open a physical standby database
temporarily read-write and the ability to rewind it back to the time when it was opened. This is
achieved by using Oracle guaranteed restore point and flashback technology.
How can this be used for refreshing a staging area?
Let’s explain it using the Figure 1. On the data warehouse machine (host DWH), a physical
standby for the database OLTP will be configured. The primary database of OLTP is on host
OLTP.
This setup leads to the following situation: Using the Data Guard functionality, any change
done on the primary database will be performed on the standby database as well.
host OLTP
host DWH
Primary Site
Standby Site
primary
database OLTP
physical standby
database OLTP
tablespace CRM
database DWH
tablespace CRM
OLTP_SITE1
OLTP_SITE2
Redo
Transport
Online
Redo
Logs
Standby
Redo
Logs
datafile crm01OLTP.dbf
Archived
Redo Logs
Staging Area CORE DWH
Archived
Redo Logs
Figure 1: On the DWH machine, a physical standby database of OLTP is configured with Data Guard
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 6 / 19
Reading from the Staging Area:
As soon as an ETL process inside the DWH database needs to read the content out of the
staging area, the following action will be taken:
- Recovery process on the standby is paused.
- Physical standby database is converted to a snapshot standby database. This opens
the standby database read write.
- Using the transportable tablespaces feature, the tablespace CRM of the snapshot
standby database is plugged into the database DWH:
o The tablespace CRM in the snapshot standby database is set to read only
mode.
o The metadata (definition of tables, indexes, etc.) of this tablespace is
transferred with data pump from the snapshot standby database to the
DWH database4.
- Datafile crm01OLTP.dbf is now part of both databases (snapshot standby database
OLTP and database DWH). In both databases the tablespace is in read only mode.
- The ETL process can read the data out of the staging area.
host OLTP
host DWH
Primary Site
Standby Site
primary
database OLTP
physical / snapshot standby
database OLTP
tablespace CRM
OLTP_SITE1
tablespace CRM
tablespace CRM
read only access
read only access
OLTP_SITE2
Redo
Transport
Online
Redo
Logs
Standby
Redo
Logs
datafile crm01OLTP.dbf
Archived
Redo Logs
database DWH
Staging Area CORE DWH
Archived
Redo Logs
Figure 2: On the DWH machine, datafile crm01OLTP.dbf is part of both databases (read-only)
4
For convenient handling of this transfer with data pump a database link can be used.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 7 / 19
Refreshing the Content in the Staging Area:
As far as there is a need to read a more current content, that means there is a need to refresh
the CRM part of the staging area, the following action will be taken:
- The plugged-in tablespace CRM is dropped from the DWH database.
- The snapshot standby database is converted back to a physical standby database.
- This resumes the recovery process of all its datafiles, including those of the
tablespace CRM.
host OLTP
host DWH
Primary Site
Standby Site
primary
database OLTP
physical / snapshot standby
database OLTP
tablespace CRM
OLTP_SITE1
tablespace CRM
tablespace CRM
recovery
dropped
OLTP_SITE2
Redo
Transport
Online
Redo
Logs
Standby
Redo
Logs
datafile crm01OLTP.dbf
Archived
Redo Logs
database DWH
Staging Area CORE DWH
Archived
Redo Logs
Figure 3: Tablespace CRM is dropped from DWH database; standby database is converted back to physical
standby
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 8 / 19
3.2
The Key Advantages
This solution has the following Key Advantages:
-
-
-
Staging area contains the full set of data.
No additional workload on the host OLTP.
Datafiles with full set of data are neither transferred nor copied.
o Volume of data transferred between the OLTP and DWH is determined merely
by the volume of data changes (size of archived redo logs).
Elapsed time of refresh process of the staging area – this represents the refresh of the
standby database - does not include the elapsed time to copy the archived redo logs
from host OLTP to DWH:
o The standby site is able to receive logs from the primary database, in both the
physical standby mode and in the snapshot standby mode. In the snapshot
standby mode, the logs are queued and not applied.
o Since the log transport to the standby site is running all the time, as the
recovery process resumes, the outstanding archived redo log files are already
registered and available for the recovery5 of the physical standby database.
Elapsed time for refresh process of staging area does not depend on tablespace size
but only on the volume of data changes since the last refresh.
Once configured, both the operation of physical standby databases and the operation
of transportable tablespaces are easy to handle and maintain.
Neither remote queries nor distributed joins are used.
On the DWH database the access methods to the data residing in the transported
tablespace(s) can be adjusted as follows:
o estimation of additional statistics like histograms
o manipulation of statistics
o creation of additional data structures like indexes or materialized views
Considering the overhead produced on the source system and the workload produced on the
DWH machine, the solution presented in this article is the most efficient one.
- Only the redo logs, and no additional structures, are used
- it works on the level of “changes on data blocks” and not on the level of “SQL
statements”
5
Transported redo logs are applied in physical standby mode only.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 9 / 19
In case the refresh of the staging area is the only purpose of the standby database on the
DWH machine, the elapsed time for the refresh process can be minimized by narrowing the
scope of the recovery process on the standby database to only those tablespaces of OLTP
database, which need to be read by the ETL application.
Usually, the ETL processes in DWH require other index types than an OLTP application. If
indexes of an OLTP schema reside in a separate tablespace, excluding them can boost the
recovery process.
Exclusion of irrelevant tablespaces can be easily achieved by offlining and deleting their
datafiles on the standby database6.
The standby database on the DWH machine can be configured to serve two purposes at the
same time: both for refresh of staging area and for disaster protection of OLTP database.
While considering this approach, be aware of the following impacts:
- A standby database with offline datafiles cannot be used for disaster protection.
- If MaxAvailability or MaxProtection is considered then the availability or the workload
on the DWH machine can impact the availability or the performance of the OLTP
database.
3.3
Technical Prerequisites
There are some technical prerequisites which have to be fulfilled, in order to be able to use the
solution described.
These prerequisites can be grouped in the following categories:
- Identical database character set
- Self-contained tablespace sets
- Required Oracle database releases
- Required Oracle licenses
3.3.1 Identical Database Character Set
In order to use transportable tablespaces the database OLTP and the database DWH must
have identical database character set and identical national character set.
3.3.2
Self-contained Tablespace Sets
In order to be able to transport a set of tablespaces it needs to be self contained. This means
that you cannot transport a set of tablespaces which contain objects with other dependent
objects such as materialized views, table partitions, etc. as long as you transfer all those objects
together in one set7.
6
Tablespaces that are needed for opening the database like system, sysaux and undo cannot be excluded.
Segmentless objects like sequences, views, pl/sql packages are not transferred with transportable
tablespaces. Normally you don’t need to transfer them into Staging Area anyway.
7
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 10 / 19
3.3.3
Required Oracle Database Release
The OLTP database needs to be operated with Oracle Database release 10g or higher.
Oracle 11g is recommended as the snapshot standby database feature is available as of this
release.
If using Oracle 10g it would be necessary to emulate this functionality manually by creating a
guaranteed restore point on the standby database before opening it read write. The following
limitations have also to be considered when running with Oracle 10g:
- There is no out-of-the-box handling with Data Guard for this functionality. You will
need to develop a piece of code – but this is quite straight forward.
- The redo transport between primary and standby is stopped during the period when
the standby is open read write8.
In order to use transportable tablespaces in this context, the DWH database needs to be at
same or higher release as the OLTP database.
3.3.4
Required Oracle Licenses
This solution requires Oracle Enterprise Edition licence both for the OLTP host and for the
DWH host. All required features like Data Guard, transportable tablespaces and snapshot
standby database are included in the Enterprise Edition license.
No additional option is required for this solution - neither the Active Data Guard9 option nor
the Partitioning option.
8
As mentioned before with a snapshot standby database as of 11g the log transport stays active all the time.
Active Data Guard is a new extra licensable option with 11g which includes real-time query and fast
incremental backup. None of these features is required by the described solution.
9
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 11 / 19
4. Take a „Tour“ on Real-Life Example
To demonstrate our approach on a representative sample we will use an excerpt from the
database schema of CRM application called Siebel. We took Siebel to improve the readability
of this example. Siebel is a widely used CRM application owned by Oracle and hence there is a
higher chance that ETL developers are familiar with the data model behind it.
It is important to understand that the described solution works with any other system or
application, even a non-standard in-house developed SW application, too10.
We took the Siebel tables S_CONTACT, S_ORG_EXT and S_ASSET as the representatives for a
set of approximately 15 Siebel tables having complex relationships and high cardinality.
4.1
Real-life example
Consider the following common Data Warehouse situation:
Transformation processes have to read the content out from Siebel tables and transform it into
a new entity, let’s call it “Customer Subscription” (refer to Figure 3).
Figure 3: Transformation process reads Siebel tables and transform the data into new entity “Customer
Subscription”
The Data Warehouse has to store not only the latest status of the “Customer Subscriptions”,
but also all the historical values. The ETL has to compare the new snapshot of “Customer
Subscriptions” with the latest one and – in case of changes – create new versions which will
keep track of the fragmented history. This concept is known as versioning -refer to Figure 4.
10
as long as the data is stored in Oracle RDBMS
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 12 / 19
Historize Subscription
Derive
Customer Subscription
Latest Snapshot
New rows
Updated rows
DELTA
Deleted rows
Highest Version
from History
S_ASSET
2.5 mio rows
S_ORG_EXT
0.5 mio rows
- create new version
- close version
C_CUST_SUBSCRIPTION
S_CONTACT
0.5 mio rows
tablespace CRM
database DWH
Staging Area
CORE DWH
Figure 4: ETL compares new snapshot of “Customer Subscriptions” with the latest one and – in case of
changes – creates new versions which keep track of the fragmented history
Consider the following design decisions of a Data Warehouse architect:
- Due to the many inner joins and filters inside the query, the Staging Area needs to hold
the full set of data.
- transferring millions of rows from source system to Staging Area every night is not an
option
- in the source system, no reliable row-markers or journals exist or can be introduced
- architect decided to use the solution described in this white paper
Because of the high cardinality of the data set (several millions of rows) good scalability of the
underlying database11 is assumed.
In the next sections we will present the most important steps to build and operate this solution.
4.2
Setup and Configuration
The Oracle Data Guard has been setup as described above in chapter 3.
On both the OLTP database and the DWH database, we used Oracle Database 11.2.0.2.0.
We created a Data Guard Broker configuration. We left the protection mode on Maximum
Performance (default) and set the log transport to asynchronous.
11
including the physical data model of CORE DWH
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 13 / 19
To enforce logging on database OLTP, we issued the following statement:
ALTER DATABASE FORCE LOGGING;
This causes every12 attempt of an unrecoverable nologging operation to be logged anyway.
4.2.1 Create Role with Common Name in Both Databases
In both the database DWH and the database OLTP, the role dwh_sa_crm_role was created:
CREATE ROLE dwh_sa_crm_role;
Grant the SELECT privilege on Siebel tables to this role in the OLTP database.
GRANT SELECT ON s_contact TO dwh_sa_crm_role;
GRANT SELECT ON s_org_ext TO dwh_sa_crm_role;
GRANT SELECT ON s_asset TO dwh_sa_crm_role;
You will also need to create the owner of the transported tables on the DWH database:
CREATE USER crm IDENTIFIED BY thisIsASecretPassword;
Neither a create session nor a create table privilege is necessary for this user.
4.3
Operation
Let’s take a look on operation of this solution.
From the point of view of the CRM data in the staging area, there are two main operational
states:
- Snapshot of latest CRM data is available in staging area (Status A)
- Refresh of CRM data in staging area is in progress (Status B)
OLTP
DWH
CRM users are changing the operative data (24/7)
A: Snapshot of latest CRM data
in Staging Area is available for read
B: Refresh of CRM data in Staging Area
time
Figure 5: Two main operational states for CRM data in the Staging Area
Most of the time, the CRM data in the staging area is available for read (Status A). Sometimes
you will need to refresh the data in the staging area: during this period, the data is not
available (Status B).
12
Like for any other change in database instance parameters: an impact analysis is required before making
this change.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 14 / 19
Transitions between these two operational states are usually triggered by one of the following
two events
- ETL processes need more current CRM data (A to B)
o This event triggers the start of refresh process
o The goal of the refresh process is to achieve a given (defined) point in time of
the snapshot
- ETL processes need to read CRM data again (B to A)
o As soon as the CRM data in physical standby is current enough the refresh
process will be terminated and transition to the Status A will be taken
o This event triggers the immediate termination of the refresh process and causes
transition to status A.
In the next paragraphs we will describe:
- the actions related with the termination of the refresh process and
- the actions related with the start of the refresh process
4.3.1
Termination of Refresh Process
As long as the refresh process is in progress, the CRM data in the staging area is not available.
The datafiles of the tablespace CRM13 on the host DWH are currently exclusively “assigned” to
the physical standby database of OLTP called OLTP_SITE2 for recovery.
In order to terminate the refresh process the following sequence of actions is taken:
Firstly the physical standby is converted to a snapshot standby database. This is performed as
follows:
DGMGRL> connect sys@OLTP_SITE2
Password:
Connected.
DGMGRL> convert database 'OLTP_SITE2' to snapshot standby
Converting database "OLTP_SITE2" to a Snapshot Standby database, please wait...
Database "OLTP_SITE2" converted successfully
Secondly the tablespace is set to read only and plugged into the DWH database with data
pump via a database link from DWH to the snapshot standby database.
SQL> alter tablespace crm read only;
# impdp system@DWH logfile=imp_crm.log network_link=OLTP_SNAP
transport_tablespaces=CRM transport_datafiles=d:\oradata\oltp\crm01oltp.dbf14
13
Of course this concept can be extended to transfer multiple tablespaces.
If running 11.2.0.2 due to Oracle Bug 10185688 it is required that either XDB is loaded into source
Database or related patch is applied.
14
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 15 / 19
In order to transport the metadata you may also use other alternatives like:
- export the metadata with data pump to a dump file and import from that dump file
instead of using a database link
- export / import with classical exp/imp15
- initiate data pump directly with PL/SQL, see [2] for details
When transferring the metadata you can also decide whether to include or exclude certain
tables. You can also choose whether to import indexes, object privileges, table triggers and
table constraints.
As the last step a deterministic function is created in DWH database. Return value of this
function reflects the timestamp of the CRM data. We used the following PL/SQL code:
declare
sql_text varchar2(1000);
v_timestamp varchar2(20);
begin
select to_char(timestamp,'DD.MM.YYYY HH24:MI:SS') into v_timestamp
from (select timestamp from gv$recovery_progress@OLTP
where item = 'Last Applied Redo'
order by start_time desc )
where rownum < 2
;
dbms_output.put_line('timestamp is ' || v_timestamp);
sql_text := 'create or replace function crm.SA_CRM_SNAPSHOT_TIMESTAMP return date
deterministic is
ts date;
begin
select to_date ('''|| v_timestamp ||''', ''DD.MM.YYYY HH24:MI:SS'')
into ts from dual;
return ts;
end;';
execute immediate sql_text;
execute immediate 'GRANT EXECUTE ON crm.SA_CRM_SNAPSHOT_TIMESTAMP to
DWH_SA_CRM_ROLE' ;
end;
/
Listing 1: In the DWH database, this creates a function which returns the timestamp for data in the CRM
tablespace
This function is used by ETL processes during the “versioning” operation (Figure 4): it is used to
build the value for VALID_FROM attributes of new versions and for VALID_TO of versions to be
closed.
15
Deprecated with 11g but worked in our case.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 16 / 19
4.3.2
Start of Refresh Process
In order to start the refresh process the following sequence of actions has to be taken:
Firstly, the tablespace CRM has to be dropped from the DWH database. After this any queries
on the data will fail as it is not available. Dependent views, synonyms, stored PL/SQL
procedures etc. get invalid.16
SQL> drop tablespace crm including contents;
DGMGRL> convert database 'OLTP_SITE2' to physical standby
If you need to check whether your physical database with CRM tablespace in OLTP_SITE2 is
again current enough in order to be used for the next integration load cycle you can easily
query the Data Guard Broker:
DGMGRL> show database 'OLTP_SITE2';
Database - OLTP_SITE2
Role:
Intended State:
Transport Lag:
Apply Lag:
Real Time Query:
Instance(s):
oltp
PHYSICAL STANDBY
APPLY-ON
0 seconds
11 minutes 23 seconds
OFF
Database Status:
SUCCESS
As the refresh is a parallel media recovery the process is very efficient.
Media recovery works block change oriented and is much faster and less resource consuming
than the mechanisms of GoldenGate and Streams where SQL is extracted and processed row
by row.
The presented real-life example demonstrates clearly the high efficiency and the easy and
stable operation of this solution.
5. Solution extension: If data availability for operational reporting
matters
There is yet another challenge for today’s DWH architects: Where to place the Operational
Reporting?
- The OLTP database is becoming a less and less suitable place due to the heavy
workload related with complex query logic inside the Operational Reports
- Many Operational Reports query not only the data residing in OLTP system, but also
additional analytical attributes which are typically stored in a Core DWH.
With the solution presented in this white paper, the DWH architect can consider to use the
data residing in the Staging Area for Operational Reporting17 as well. As this data resides in the
16
They get valid automatically again when they are used after the tablespace reappears in next cycle.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 17 / 19
data warehouse database, it can be joined with analytical attributes in Core DWH without
performance impacts (no distributed queries).
However, one point has to be taken into consideration: During the refresh, the data in Staging
Area is not available (refer to Figure 5). This unavailability needs to be eliminated.
The snapshot functionality of operating system and/or storage facility can be used to
overcome this.
The concept: After the standby database is turned into snapshot standby and the tablespaces
are set to read only, snapshots of data files will be created. These snapshots are then plugged
into the DWH database, instead of the standby databases data files.
This will result in two advantages:
- the data in the Staging Area is available almost all18 the time
- the recovery of the tablespaces can go on as the standby database can be converted
from snapshot standby back to a physical standby right after taking the snapshot of the
data files. This will achieve an even shorter latency for the refresh cycles.
Note: the snapshots do not copy the data. The data is presented a second time.
Later changes are tracked for both sets of data: origin and snapshot.
This is known as copy on write mechanism (COW).
Examples for OS side snapshotting:
With ZFS on Solaris you have a feature of taking copy on write snapshots. It is also possible
with Veritas file system, LVM snapshots in Linux and Microsoft Volume Shadow Copy on
Windows. SAN and NAS Systems also offer snapshotting features that work with COW
mechanism.
By using the knowhow of Trivadis, we believe it is possible to reduce operating costs and the
complexity of your data warehouse: proper design is what matters!
Kontakt
Karol Hajdu
Mathias Zarick
[email protected]
[email protected]
Trivadis Delphi GmbH
Millennium Tower
Handelskai 94-96
A-1200 Vienna
Tel.: +43 1 332 35 31 00
www.trivadis.com
Please contact us if you need more information or help with your setup.
17
18
at least for that part of reporting where the integrity level of data in Staging Area is sufficient.
short downtime will still occur during tablespace drop and re-plugin
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 18 / 19
Literature and Links
[1] Oracle® Data Guard Concepts and Administration,
http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htm
[2] Oracle® Database PL/SQL Packages and Types Reference – Chapter 46,
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htm
[3] Data Warehousing mit Oracle – Business Intelligence in der Praxis. Chapter 3.4. Jordan et al. Hanser.
2011.
[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 19 / 19