Download CS424 - Lecture 16

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Operational transformation wikipedia , lookup

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

Database model wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Advanced Database Management Systems
Lecture 16
Data Warehouse
CS 424
DAVID A . SAMPAH
ASHESI
MARCH 2016
Outline
Data Warehouse
Reading for Topic:
Elmasri & Navathe, Chapters 29
Connolly & Begg, Chapters 31 - 34
Data Warehouse
Why?
What is it?
How designed?
Data Warehouse – Why?
Data scattered around the organisation in
different locations and possibly different
formats
Many “Data islands”
Want to bring it together for the purposes of
supporting management decision making
Data Warehouse – Definition
A subject-oriented, integrated, time-variant and
non-volatile collection of data in support of
management’s decision making process
Data Warehouse
Subject-oriented as the warehouse is organized around the major subjects of
the enterprise (such as customers, products, and sales) rather than the major
application areas (such as customer invoicing, stock control, and product
sales).
Integrated because of the coming together of source data from different
enterprise-wide applications systems.
Time-variant because data in the warehouse is only accurate and valid at
some point in time or over some time interval.
Non-volatile as the data is not updated in real time but is refreshed from
operational systems on a regular basis. New data is always added as a
supplement to the database, rather than a replacement.
Data Warehouse
Central repository of corporate data (data warehouse), separate from the
operational systems within the business
Data is organised in accordance with business needs for decision making
support (i.e., by subject rather than by event) and is read only
Consistent and repeatable process for loading operational data – ETL process
Will continue to grow as data is added to the data repository
Unifies business views – one standard view!
Several end-user tools available for the effective manipulation of data
Data Warehouse
– Architecture
Client
Access Tools
Data
Mart
Meta
Data
DW
Data Transformation (ETL)
Operational Data
External Data
Data Transformation Includes:
Removing unwanted data from the operational databases
Converting to common definitions and data names
Calculating summaries and derived data
Establishing values for missing data
The Meta Data contains the details of the data transformation
process, as well as other important data.
Data Warehouse Access Tools
Query and reporting tools
 SQL
 Easy to use front end query tools that generate SQL statements
In-house Applications developed for use with a particular Data Warehouse
Executive information Systems (EIS) tools
Management
OLAP tools
Support Tools
Data mining tools
Data Mart
Subset of data warehouse, used by a particular unit of function
Evolution of data warehouse:
Data mart to data warehouse
Data warehouse to data mart
Access tools sometimes have their own “data marts”
Data Warehouse
– Architecture
Client
Access Tools
Data
Mart
Meta
Data
DW
Data Transformation (ETL)
Operational Data
External Data
Data Warehouse Development Issues
Data Transformation support is immature and somewhat restrictive –
principal focus of key data warehousing/data analytics software developers
(e.g., SAS)
Data Warehouse Design is captured using Star Schema Database Design
Star Schema – each star relating to a
particular subject
Star Schema Database Design == Star Schema ER Model
Fact Table in the middle
Measures or dimensions around the outside
 Only one level of dimensions
 More than one -> snowflake schema design
 De normalised form in star schema
One fact table surrounded by as many dimension tables as required, which
allows different perspectives of the data to be formed.
Use of surrogate keys
Example Star Schema for Batch facts
Factory
1..1
1..*
Employee
1..1
1..*
1..*
1..*
1..1
Product
1..*
Batch
1..1
Machine
1..1
Time
Summary
Data Warehousing Concept
Definition
Architecture including
Data transformation
Data Access Tools
Data Warehouse Design
Star Schema Database Design
Any Question?