Download Introduction to Data Warehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database model wikipedia , lookup

Forecasting wikipedia , lookup

Data analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
Why Data Warehouse?
Scenario 1
ABC Pvt. Ltd is a company with branches at
Mumbai, Delhi, Chennai and Bangalore.
The Sales Manager wants quarterly sales report.
Each branch has a separate operational system.
Scenario 1 : ABC Pvt Ltd.
Mumbai
Delhi
Sales per item type per branch
for first quarter.
Chennai
Banglore
Sales
Manager
Solution 1:ABC Pvt Ltd.
 Extract sales information from each database.
 Store the information in a common repository
at a single site.
Solution 1:ABC Pvt Ltd.
Mumbai
Report
Delhi
Data
Warehouse
Chennai
Banglore
Query &
Analysis tools
Sales
Manager
Scenario 2
One Stop Shopping Super Market has huge
operational database.
Whenever Executives wants some report the
OLTP system becomes slow and data entry
operators have to wait for some time.
Scenario 2 : One Stop Shopping
Data Entry Operator
Report
Wait
Data Entry Operator
Operational
Database
Management
Solution 2
 Extract data needed for analysis from
operational database.
 Store it in another system, the data warehouse.
 Refresh warehouse at regular intervals so that it
contains up to date information for analysis.
 Warehouse will contain data with historical
perspective.
Solution 2
Data Entry
Operator
Report
Transaction
Data Entry
Operator
Operational
database
Extract
data
Data
Warehouse
Manager
Scenario 3
Cakes & Cookies is a small, new company. The
chairman of this company wants his company to
grow. He needs information so that he can make
correct decisions.
Solution 3
 Improve the quality of data before loading it into the
warehouse.
 Perform data cleaning and transformation before
loading the data.
 Use query analysis tools to support adhoc queries.
Solution 3
Expansio
n
sales
Data
Warehouse
Query &
Analysis
tool
Chairman
time
Improvemen
t
Summing up?
 Why do you need a warehouse?
 Operational systems could not provide strategic
information
 Executive and managers need such strategic information
for





Making proper decision
Formulating business strategies
Establishing goals
Setting objectives
Monitoring results
Why operational data is not capable of
producing valuable information?
 Data is spread across incompatible structures and
systems
 Not only that, improvements in technology had made
computing faster, cheaper and available
FAILURES OF PAST DECISIONSUPPORT SYSTEMS
OLTP
systems
Decision support systems
Operational and informational
What is Data Warehouse??
Is it the only viable solution
Business intelligence at DW
Functional definition of a DW
 The data warehouse is an informational environment
that
 Provides an integrated and total view of the enterprise
 Makes the enterprise’s current and historical
information easily available for decision making
 Makes decision-support transactions possible without
hindering operational systems
 Renders the organization’s information consistent
 Presents a flexible and interactive source of strategic
information
Questions????
 Describe five differences between operational systems
and informational systems
 A data warehouse in an environment, not a product.
Discuss.
Inmons’s definition
A data warehouse is
- subject-oriented,
- integrated,
- time-variant,
- nonvolatile
collection of data in support of management’s
decision making process.
Subject-oriented
 Data warehouse is organized around subjects
such as sales, product, customer.
 It focuses on modeling and analysis of data for
decision makers.
 Excludes data not useful in decision support
process.
Integration
 Data Warehouse is constructed by integrating
multiple heterogeneous sources.
 Data Preprocessing are applied to ensure
consistency.
RDBMS
Legacy
System
Flat File
Data
Warehouse
Data Processing
Data Transformation
Integration
 In terms of data.
 encoding structures.
 Measurement of
attributes.
 physical attribute.
of data
remarks
 naming conventions.
 Data type format
Time-variant
 Provides
information
from
perspective, e.g. past 5-10 years
historical
 Every key structure contains either implicitly or
explicitly an element of time, i.e., every record
has a timestamp.
 The time-variant nature in a DW
 Allows for analysis of the past
 Relates information to the present
 Enables forecasts for the future
Non-volatile
 Data once recorded cannot be updated.
 Data warehouse requires two operations in data
accessing
 Initial loading of data
 Incremental loading of data
load
access
Data Granularity
 In an operational system, data is usually kept at the lowest
level of detail.
 In a DW, data is summarized at different levels.
Three data levels in a banking data warehouse
Daily Detail
Monthly Summary
Quaterly Summary
Account
Account
Account
Activity Date
Month
Month
Amount
No. of transactions
No. of transactions
Deposit/ Withdraw
Withdrawals
Withdrawals
Deposits
Deposits
Beginning Balance
Beginning Balance
Ending Balance
Ending Balance
Operational v/s Information System
Features
Operational
Information
Characteristics
Operational processing
Informational processing
Orientation
Transaction
Analysis
User
Clerk,DBA,database
professional
Knowledge workers
Function
Day to day operation
Decision support
Data Content
Current
Historical, archived,
derived
View
Detailed, flat relational
Summarized,
multidimensional
DB design
Application oriented
Subject oriented
Unit of work
Short ,simple transaction Complex query
Access
Read/write
Read only
Operational v/s Information System
Features
Operational
Information
Focus
Data in
Information out
No. of records
accessed
tens/ hundreds
millions
Number of users
thousands
hundreds
DB size
100MB to GB
100 GB to TB
Usage
Predictable, repetitive
Ad hoc, random,
heuristic
Response Time
Sub-seconds
Several seconds to
minutes
Priority
High performance,high High flexibility,endavailability
user autonomy
Metric
Transaction throughput Query throughput
Two approaches in designing
a DW
Top-down approach
Bottom-up approach
Enterprise view of data
Narrow view of data
Inherently architected
Inherently incremental
Single, central storage of data
Faster implementation of
manageable parts
Centralized rules and control
Each datamart is developed
independently
Takes longer time to build
Comparatively less time than a DW
Higher risk to failure
Less risk of failure
Needs higher level of cross-functional Unmanageable interfaces
skills
Bottom Up Approach
Top Down Approach
A Practical Approach-Kimball
Plan and Define requirements
2. Create a surrounding architecture
3. Conform and Standardize the data Content
4. Implement Data Warehouse as series of super-mart
one at a time.
1.
An Incremental Approach
Sales
Distribution
Product
Glossary
Marketing
Customer
Common Business
MetricsAccounts
Common Business Rules
Common Business Dimensions
Finance
Inventory
Vendors
Common Logical Subject Area ERD
Individual Architected Data Marts
The Eventual Result
Sales
Distribution
Product
Architected
Enterprise
Foundation
Marketing
Finance
Customer
Inventory
Accounts
Vendors
Enterprise Data Warehouse
Data Warehouse:
Holds multiple subject areas
Holds very detailed information
Works to integrate all data sources
Does not necessarily use a dimensional model but feeds dimensional
models.
Data Mart
Often holds only one subject area- for example, Finance, or Sales
May hold more summarised data (although many hold full detail)
Concentrates on integrating information from a given subject area or
set of source systems
Is built focused on a dimensional model using a star schema.
Data Warehouse verses data
marts