Download “INTRODUCTION TO DATA WAREHOUSING”

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Oracle Database wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
INTRODUCTION TO DATA WAREHOUSING
1.0 INTRODUCTION

A process of transforming information and
making it available to users in a timely enough
manner to make a difference

1.1 DATA WAREHOUSE
1.4 WARE HOUSING APPROACH
“A DW is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in
organizational decision making.”
A single, complete and consistent store of
data obtained from a variety of different
sources made available to end users in a what
they can understand and use in a business
context.
1.2 DATA WAREHOUSING

Inefficient and potentially expensive for
frequent queries
Competes with local processing at
sources
Hasn’t caught on in industry


Clients
Data
Warehouse
MARKET
1. In 1996, close to 90% of IT professionals
had either created a data
warehouse or were planning to create one
2. Average 3 year ROI of 400%
3. Average payback was 2.3 years on costs
averaging $2.2 million.
Information integrated in advance
Stored in wh for direct querying and
analysis
Integration System
Metadata
...
Extractor/
Monitor
Extractor/
Monitor
Source
Source
Extractor/
Monitor
...
Source
1.5 ADVANTAGE OF WAREHOUSING
APPROACH


1.3 DISADVANTAGES OF
TRADITIONAL QUERY-DRIVEN
APPROACH

Delay in query processing
 Slow or unavailable information
sources
 Complex filtering and integration


High query performance But not
necessarily most current information
Doesn’t interfere with local processing at
sources
 Complex queries at warehouse
 OLTP at information sources
Information copied at warehouse
 Can modify, annotate,
summarize, restructure, etc.
 Can store historical information
 Security, no auditing
Has caught on in industry
.
2.0 DATA WAREHOUSE
ARCHITECTURE

Two-layer:It is a Real-time and most
commonly used approach with derived
data in present industry.
Operational
systems
Informational
systems
Derived Data
Real-time data

Three-layer Architecture:
Transformation of real-time data to
derived data really requires two steps
Operational
systems
Derived Data
2.1 DATA WAREHOUSE
ARCHITECTURES:
CONCEPTUAL VIEW
Reconciled Data
Informational
systems
View level
“Particular informational
needs”
Physical Implementation
of the Data Warehouse
Real-time data

Single-layer :Every data element is
stored once only with Virtual warehouse
Operational
systems
Informational
systems
2.2 THE KEY CHARACTERISTICS OF
A DATA WAREHOUSE
The key characteristics of a data warehouse are
as follows:
“Real-time
data”


Some data is denormalized for
simplification and to improve
performance.
Large amounts of historical data are used.



Queries often retrieve large amounts of
data.
Both planned and ad hoc queries are
common.
The data load is controlled
2.2 COMMON ORACLE DATA
WAREHOUSING TASKS
In general, high data throughput is the key to a
successful data warehouse.
As an Oracle data warehousing administrator or
designer, you can expect to be involved in the
following tasks:









Configuring an Oracle database for use
as a data warehouse
Designing data warehouses
Performing upgrades of the database and
software to new release levels
Managing schema objects, such as
tables, indexes, and materialized views
Managing users and security
Developing routines used for the
Extraction, Transformation, and Loading
(ETL) process
Creating reports based on the data in the
data warehouse
Backing up the data warehouse and
performing recovery when necessary
Monitoring the data warehouse's
performance and taking preventive or
corrective action as required
2.3 TOOLS FOR ADMINISTERING THE
DATA WAREHOUSE
The intent of this guide is to enable you to quickly
and efficiently create and administer an Oracle
data warehouse. The following are some of the
products, tools, and utilities you can use to
achieve your goals with your data warehouse
2.3.1 ORACLE UNIVERSAL
INSTALLER
Oracle Universal Installer installs your Oracle
software and options. It can automatically start the
Database Configuration Assistant (DBCA) to
install a database.
2.3.2
ORACLE ENTERPRISE
MANAGER
The primary tool for managing your database is
Oracle Enterprise Manager, a Web-based
interface. After you have installed the Oracle
software, created or upgraded a database, and
configured the network, you can use Oracle
Enterprise Manager for managing your database.
In addition, Oracle Enterprise Manager also
provides an interface for performance advisors
and for Oracle utilities such as SQL*Loader and
Recovery Manager.
2.3.3 ORACLE WAREHOUSE
BUILDER
The primary product for populating and
maintaining a data warehouse, Oracle Warehouse
Builder provides ETL, data quality management,
and metadata management functionality in a
single product.Warehouse Builder includes a
unified repository hosted on an Oracle Database.
Warehouse Builder leverages Oracle Database
functionality to generate code optimized for
loading into and maintaining Oracle Database
targets
2.3.4 DATABASE TUNING PACK
Oracle Database Tuning Pack offers a set of new
technologies that automate the entire database
tuning process, which significantly lowers
database management costs and enhances
performance and reliability. The key features of
Oracle Database Tuning Pack that will be used in
this guide are the SQL Access and SQL Tuning
Advisors.
2.4 PROCESS FLOW WITHIN A DATA
WAREHOUSE
The processes that represent data flow within a
data warehouse are :




Extract and load the data.
Clean and transform the data.
Back up and archive data.
Managing queries and direct them to the
appropriate data sources.
2.4.1 EXTRACT AND LOAD PROCESS
Data extraction takes data from source systems
and makes it available to the data warehouse:
data load takes extracted data and loads it into
the data warehouse. Data in operational systems
is held in a form suitable for that system. When
we extract data from a physical database , the
original information content will have been
modified and extended over the years, in order to
support the data requirements of the operational
system. Before loading the data into the data
warehouse this information content must be re
constructed.
Once the data is extracted from the source
systems , it is typically loaded into a temporary
data store in order for it to be cleaned up and
made consistent .
2.4.2 CLEAN AND TRANSFORM THE
DATA
This is the system process that takes the loaded
data and structures it for query performance and
for minimizing operational costs.
There are a small number of steps with in a
process :
1. clean and transform the loaded data into
a structure that speed up queries.
 Make sure the data consistent
with in itself.
When you take a row of data and
examine it, the contents of the row
must make sense. Errors at this point
are to do with errors in the source
systems. Typicall checks are for
nonsensical
phone
numbers
,
addresses , counts and so on.



Make sure that data is consistent
with other data with in the same
source.
Make sure data is consistent with
data in the other source systems.
Make sure data is consistent with
the information already in the
warehouse.
2. Partition the data in order to speed up
queries, optimize hardware performance
and simplify the management of the data
warehouse.
3. create aggregations to speed up the
common queries .
2.4.3 BACK UP AND ARCHIVE
PROCESS
As in operational systems, the data with in the
data warehouse is backed up regularly in order to
ensure that the data warehouse can always be
recovered from data loss, software failure or
hardware failures.
In archiving, older data is removed from the
system in a format that allows it to be quickly
restore if required. For example, in a retail sales
analysis data warehouse there may be a
requirement to keep data for three years, with the
latest six months being kept online. In this sort of
process there is often a requirement to be able to
do month-on-month comparisons for this year and
last year. This will require some months of data to
be temporarily restored from archive.
2.4.4 MANAGING QUERIES
The query management process is the system
process that manages the queries and speeds up
by directing queries to the most effective data
source. This process must also ensure that all the
system resources are used in the most effective
way, usually by scheduling the execution of
queries. The query management process may
also be required to monitor the actual query
profiles.
Unlike
other
system
processes,
query
management process does not operate during the
regular load of information into the data
warehouse. This process operates at all times that
the data warehouse is made available to end
users.
3.0 DECISION SUPPORT SYSTEM
An application that issues queries to the read-only
database is called a decision support
system(DSS).
Used to manage and control business
- Data is historical or point-in-time
- Optimized for inquiry rather than update
- Use of the system is loosely defined and
can be ad-hoc
- Used by managers and end-users to
Understand the business and make
judgments
An application that updates is called an on-line
transaction processing (OLTP) application.
3.1 DATA WAREHOUSE FOR
DECISION SUPPORT
Putting Information technology to help the
Knowledge worker make faster and better
Decisions
 Which of my customers are most likely to
go to the competition?
 What product promotions have the
biggest impact on revenue?
 How did the share price of software
Companies correlate with profits over
last 10 years.
3.2 SECURITY IN DATA WAREHOUSE


Building a data warehouse does increase
security risk because key, corporate
information is all in one place
To mitigate that risk, database system
Components can be used to protect the
data warehouse. These include
– Views
– Access control
–
–
–
Security Administration
Encryption
Audit
– Views
– Allow users to only see certain rows or
columns of data
– Access control
– Indicate which users have access to what
data
– Administration
– This component is used to actually give
access to groups of users
and to define the accesses given to either an individual
or a group.
– Encryption
– Protect data from access outside of the DBMS
– Audit
– Track what users are doing