Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Right In Time
Presented By: Maria Baron
Written By: Rajesh Gadodia
Intelligent Enterprise
Feb 7, 2004
Vol. 7, Iss. 2; pg 26
Traditional Data Warehouse




Central repository of transactional data
spread across heterogeneous platforms and
applications
Focused on strategic reporting and analysis
Loaded periodically (nightly, weekly, monthly)
Information latency
Evolution of The Data
Warehouse

First-generation


Second-generation



Reporting
Analytic processing and data mining
Multidimensional tools for drill down
New generation



Speed information cycle time
Minimize latency
Information on demand
Why Real Time Data
Warehousing?







Active decision support
Business activity monitoring (BAM)
Alerting
Efficiently execute business strategy
Monitoring is completed in the background
Positions information for use by downstream
applications
Can be built on top of existing data
warehouse
Traditional Vs. Real-Time Data
Warehouse

Traditional Data Warehouse (EDW)

Strategic



Batch


Offline analysis
Isolated


Passive
Historical trends
Not interactive
Best effort

Guarantees neither availability nor performance
Traditional Vs. Real-Time Data
Warehouse

Real-Time Data Warehouse (RTDW)

Tactical


Real-Time



Information on Demand
Most up-to-date view of the business
Integrated


Focuses on execution of strategy
Integrates data warehousing with business processes
Guaranteed

Guarantees both availability and performance
Real-Time Integration

Goal of real-time data extraction,
transformation and loading



Keep warehouse refreshed
Minimal delay
Issues


How does the system identify what data has been
added or changed since the last extract
Performance impact of extracts on the source
system
Real-Time Data Warehouse –
Logical Architecture
Techniques for real-time ETL

Simulated real-time feed




Increase the frequency of batch runs
Most useful when information is not required to be
‘up to the minute’
Requires minimal changes to existing ETL
infrastructure
Easy to implement
Techniques for real-time ETL

Trickle Feed





Allows continuous update of the RTDW as the
data in the source system changes
Messaging infrastructure
Perpetually open data pipe
Also called streaming
Basic elements – Capture, Stage and Apply
Techniques for real-time ETL

Trickle feed (cont.)





Target and source databases must be configured
May require special gateways
Source – capture process: automatically capture
changes to data or table structure
RTDW records changes as logical change
records (LCRs) that are kept in a staging partition
called the message queue
The message queue can be explicitly updated by
user applications
Techniques for real-time ETL

Trickle feed Role of Target database



A process takes the logical change records out of
the message queue and applies changes to
selected database objects
Rules are set in message queues to handle data
transformation
Require upfront development and can be complex
to configure and manage
Trickle Feed Architecture for
Real-Time load
Information Delivery

Changes to traditional data warehouse




Need to accommodate continuous data trickle
feeds intermixed with liver user queries
Schema design
Active partition management
Data aggregation
Designing an RTDW - Options

Trickle And Flip




Copy of fact table is made and given a name that
cannot be accessed by queries
As new data trickles in, it is appended to copy of
the fact table
At certain intervals, the trickle is halted, the copy
fact table is copied, renamed to the active fact
table name, (the active fact table is deleted) and
the process starts over
Poses scalability problems – may not keep up
with the trickle depending on the size of the table
Designing an RTDW - Options

Table Partitioning



Allows for the creation of large tables that are
handled internally by the database as a series of
smaller ones, each with its own indexes
Can rope off partition so it isn’t visible to active
queries
Problem: Determining criteria for partitioning
Designing an RTDW - Options

Real-Time partitions




Create new tables that resemble active fact tables
that are designed for quick updates
Interval tables – contain data from only the last
update
Truly real-time
Can be accessed by analysts and other BI tools
Real-Time Partition
Conclusion

RTDWs have an a distinct advantage for
those business utilizing time-sensitive data





Call Centers
Performance indicators
Fraud detection
Yield management
Certain financial transactions