Download Intro to Data Warehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Warehousing
Adopted from
Dr. Sanjay Gunasekaran
Main Topics
n Overview of Data Warehouse
n Concept of Data Conversion
n Importance of Data conversion and the steps
involved
n Common Industry Methodology
n Outline and Analysis done in the Alternate
Plan paper
Data warehousing
n It is a concept and not a product
n A method to analyze massive amounts of data to make better business
decisions.
n Helpful in analyzing Sales data(E.g..) and make decisions that affect
the company’s performance.
n A Data warehouse in general contains Summarized, De-normalized
and Replicated data that is infrequently updated and is optimized for
decision support applications.
Comparison between Operational Environment and
Data Warehouse
Operational Environment
Detailed
Current
Transaction Driven
Minimum redundancy
Static Structure
Small amount of data
Constantly updated
Data Warehouse
Summarized
Variable over time
Analysis driven
Some redundancy
Flexible structure
Huge volumes of data
Infrequently Updated
Data Warehouse Concepts
n
n
Multidimensional Model
a) Facts
- Table containing aggregate information required for analysis.
b) Dimensions
- Classes of descriptors of the facts.
c) Hierarchies
- Level of Aggregation of data.
Databases
a) Relational
i) Oracle
b) Multi-Dimensional
i) Oracle Express
ii) Essbase
iii) Gentium
Implementation Steps
n Analyze user requirements for the Data warehouse.
n Analyze existing transaction Processing Data.
n Design the Data warehouse (Multi-dimensional Model)
n Create the Data warehouse (Relational or Multi-dimensional)
n Extract and clean the operational data.
n Migrate and load the data into the warehouse.
n Do decision support analysis on the warehouse data using
OLAP tools.
n Create reports for reporting purposes.
Data Warehouse Architecture
MetaData
OLTP SYSTEMS
End User
Data Warehouse
Extraction
Cleaning
Loading
Staging Area
General
Ledger
Accounts
Payable
Terminology's
Purchase
Order
a) OLTP systems
d) Staging Area
b) Metadata
e) Extraction, Loading & Migration
c) Data Warehouse
f) External Data
Enternal Data From
Legacy Systems
Data Warehouse Architecture
(Contd..)
n OLTP Systems
Online Transaction Processing Systems, Production
Systems. Systems used to manage and run the business.
n Metadata
n consists of information about the data that feeds, gets
transformed and exists in the Data Warehouse
n Data Warehouse
n Core of the Architecture
n supports informational processing by providing a solid
platform of integrated, historical data from which to do
analysis
n
Data Warehouse Architecture
(Contd..)
n Staging Area
Data Warehouse workbench
n the place where raw data is brought in, cleaned, combined,
archived and eventually exported to either the Data
Warehouse or to one or more Data Marts
n Extraction, Cleaning & Loading
n Known as the Data Conversion process.
n The process by which data from the operational systems are
moved to the Warehouse
n One of the most important steps in the implementation of a
Data Warehouse.
n External Data
n
Data Conversion
n Loading of data from the operational system to the Data
warehouse.
n Process wherein data is extracted, cleaned, combined, archived
and eventually loaded into the Data warehouse.
n Complex, time-consuming and unglamorous.
n Comprises of the following processes:
a) Extraction
b) Cleaning
c) Loading
n Very, Very important section of the Data warehousing process.
Importance of Data Conversion
n The Data warehouse holds the information that is the key to a
corporation’s decision making process.
n Unreliable and “Dirty” data can effect the performance of the
corporation.
n Examples
a) Marketing communications.
b) Retail Sales
c) Medical records
Steps in Data Conversion
n Extract data from the operational systems to intermediate
schema (Staging area).
- Staging area is the Data warehouse workbench where the data is
cleaned, combined, archived and eventually exported to the Data
warehouse.. It has the same schema structure as the operational
system.
n Convert the intermediate schema to “load data”.
n Aggregate the “load data”.
n Migrate the “load data” from the staging area to the Data
Warehouse server (if the staging area is not on the same server
as the warehouse).
n Load the data into the Data warehouse.
Data Conversion Process
Quality Assurance of Data
Plan
Conversion
Create
Conversion
Specifiactions
Extract Source Data to
Intermediate Schemas
Condition
Data
Clean Data
Transform
Data
Integrate
Data
Aggregate
Load Data
Move and
Load Data
Data Conversion
n Extraction
- Routines are created to read source data and move it to an
intermediate staging area.
- Staging Area has the same schema as the source. It is important as
the data is cleaned before it is uploaded into the warehouse.
n Convert intermediate Schemas to “Load Data”
- Data cleaning process. It comprises of:
- Data examination
- Data parsing
- Data correction
- Record matching
- Data transformation
Data Conversion (Contd..)
n Aggregate “Load data”
- “Load data” is aggregated by executing a series of sorts externally.
n Move the “Load data” from the staging area onto the Data
warehouse server
- Done if the Data warehouse server is different
n Load the data onto the Data warehouse
- Done using SQL routines or bulk-load utilities.
Paper Outline
n Brief explanation of Data warehousing concept
n Data warehouse architecture
n Data conversion
n Importance of data conversion
n Common Industry methodology
n Analysis of Data conversion process using an example:
- Sales Order System
Overall Analysis
n Concept of the paper was to outline the Data Conversion
process.
n Design a Relational Database, Staging Area and Data
Warehouse.
n Move Data from the Relational database to the Staging Area
n Move Data from the Staging area to the Warehouse.
In-depth Analysis
n Designed the Relational Database to reflect the Transactional
processing system of a common Organization.
n Designed the Staging Area to reflect only the Sales system.
n Designed the Data Warehouse for the Sales system.
n Built the relational database(source system) for the quoted example
(Sales System) in Oracle
n Built the Staging Area in Oracle.
n Built the Data Warehouse in Oracle (Multi Dimensional Design in a
relational Database).
n Created Views for the source tables(Transparency)
n Created synonyms for the views (as source tables were in a different
server)
In-depth Analysis (Contd..)
n Wrote SQL scripts to first move data from the synonyms created, to the
Staging area.
n Wrote SQL scripts and procedures to move data from the Staging Area
to the Data Warehouse.
n
Data was moved first from the Staging area tables to the dimension
tables namely Product, Location and Customer.
n
Time dimension table was populated with 10 years of data. Additional
scripts were written to populate the time dimension with data every
year.
n
Data was moved from the Staging area to the fact table (Core Table).
n Wrote scripts to check for the consistency of data. These scripts
checked the total records moved from the Source system to the
Satging area and from the Staging area to the Data Warehouse.
Additionally, they checked for the total amount moved from the
database to the Data Warehouse.
Conclusion
n The importance of the Data warehouse can only be achieved by
OLAP analysis and Data Mining.
n Data Conversion is one of the most critical process in
implementing a Data warehouse
n Warehouse holds the information that is of great value to the
enterprise
n Data conversion process must be done effectively and efficiently