Download DEVQ400-01 Developing OLAP Business Solutions with Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Warehouse and
Business Intelligence
Dr. Minder Chen
[email protected]
Fall 2009
BI
Business Intelligence (BI) is the process of gathering meaningful
information to answer questions and identify significant trends or
patterns, giving key stakeholders the ability to make better
business decisions.
“The key in business is to
know something that
nobody else knows.”
-- Aristotle Onassis
PHOTO: HULTON-DEUTSCH COLL
“To understand is to perceive patterns.”
— Sir Isaiah Berlin
"The manager asks how and when,
the leader asks what and why."
— “On Becoming a Leader” by Warren Bennis
© Minder Chen, 2004-2010
Data Warehouse - 2
BI Questions
• What happened?
–
What were our total sales this month?
• What’s happening?
–
Are our sales going up or down, trend analysis
• Why?
–
Why have sales gone down?
• What will happen?
–
Forecasting & “What If” Analysis
• What do I want to happen?
–
Planning & Targets
Source: Bill Baker, Microsoft
© Minder Chen, 2004-2010
Data Warehouse - 3
Business Intelligence
Increasing potential
to support
business decisions (MIS)
Making
Decisions
Data Presentation
Visualization Techniques
End User
Business
Analyst
Data
Analyst
Data Mining
Information Discovery
Data Exploration
OLAP, MDA,
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources
(Paper, Files, Information Providers, Database Systems, OLTP)
© Minder Chen, 2004-2010
DBA
Data Warehouse - 4
Inmon's Definition of Data Warehouse – Data View
• A warehouse is a
– subject-oriented,
– integrated,
– time-variant and
– non-volatile
collection of data in support of
management's decision making process.
Source: http://www.intranetjournal.com/features/datawarehousing.html
– Bill Inmon in 1990
© Minder Chen, 2004-2010
Data Warehouse - 5
Inmon's Definition Explain
• Subject-oriented: They are organized around major
subjects such as customer, supplier, product, and
sales. Data warehouses focus on modeling and
analysis to support planning and management
decisions vs. operations and transaction processing.
• Integrated: Data warehouses involve an integration of
sources such as relational databases, flat files, and online transaction records. Processes such as data
cleansing and data scrubbing achieve data
consistency in naming conventions, encoding
structures, and attribute measures.
• Time-variant: Data contained in the warehouse provide
information from an historical perspective.
• Nonvolatile: Data contained in the warehouse are
physically separate from data present in the
operational environment.
© Minder Chen, 2004-2010
Data Warehouse - 6
The Data Warehouse Process
Source
Systems
1
Extract, Transform, & Loading
ETL
Design the
Data Warehouse
© Minder Chen, 2004-2010
Data Marts and cubes
Clients
Data
Warehouse
2
Populate
Data Warehouse
Query Tools
Reporting
Analysis
Data Mining
3
Create
OLAP Cubes
4
Query
Data
Data Warehouse - 7
OLTP Normalized Design
Warehouse
Ordering
Process
Chain
Retailer
Store
Retailer
Payments
Retailer
Returns
Product
POS
Process
Retail
Promo
Brand
GL
Account
Retail
Cust
Cash
Register
© Minder Chen, 2004-2010
Clerk
Data Warehouse - 8
OLTP Versus Business Intelligence: Who asks what?
OLTP Questions
• When did that order ship?
• How many units are in
inventory?
• Does this customer have
unpaid bills?
• Are any of customer X’s line
items on backorder?
© Minder Chen, 2004-2010
Analysis Questions
• What factors affect order
processing time?
• How did each product line (or
product) contribute to profit last
quarter?
• Which products have the lowest
Gross Margin?
• What is the value of items on
backorder, and is it trending up
or down
over time?
Data Warehouse - 9
OLTP vs. OLAP
Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22
© Minder Chen, 2004-2010
Data Warehouse - 10
Dimensional Design Process
Business
Requirements
• Select the business process to model
• Declare the grain of the business process/data
in the fact table
• Choose the dimensions that apply to each fact
table row
• Identify the numeric facts that will populate
each fact table row
Data
Realities
© Minder Chen, 2004-2010
Data Warehouse - 11
Select a business process to model
• Not business departments or business
functions
• Cross-functional business processes
• Business events
• Examples:
–
–
–
–
–
–
Raw materials purchasing
Order fulfillment process
Shipments
Invoicing
Inventory
General ledger
© Minder Chen, 2004-2010
Data Warehouse - 12
Requirements
© Minder Chen, 2004-2010
Data Warehouse - 13
Identifying Measures and Dimensions
Performance Measures
for KPI
Measures
Performance Drivers
Dimensions
The attribute varies
continuously:
The attribute is perceived as
a constant or discrete value:
•Balance
•Unit Sold
•Cost
•Sales
•Description
•Location
•Color
•Size
© Minder Chen, 2004-2010
Data Warehouse - 14
A Dimensional Model for a Grocery Store Sales
© Minder Chen, 2004-2010
Data Warehouse - 15
Product Dimension
• SKU: Stock Keeping Unit
• Hierarchy:
–
Department  Category  Subcategory  Brand  Product
© Minder Chen, 2004-2010
Data Warehouse - 16
Creating Dimensional Model
• Identify fact tables
• Translate business measures into fact tables
• Analyze source system information for additional
measures
• Identify base and derived measures
• Document additivity of measures
• Identify dimension tables
• Link fact tables to the dimension tables
• Create views for users
© Minder Chen, 2004-2010
Data Warehouse - 17
Transaction Level Order Item Fact Table
Always has a date dimension
© Minder Chen, 2004-2010
Data Warehouse - 18
Inside a Dimension Table
• Dimension table key: Uniquely identify each row. Use
surrogate key (integer).
• Table is wide: A table may have many attributes
(columns).
• Textual attributes. Descriptive attributes in string
format. No numerical values for calculation.
• Attributes not directly related: E.g., product color and
product package size. No transitive dependency.
• Not normalized (star schemar).
• Drilling down and rolling up along a dimension.
• One or more hierarchy within a dimension.
• Fewer number of records.
© Minder Chen, 2004-2010
Data Warehouse - 19
Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metric) of the
business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions
• Joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
© Minder Chen, 2004-2010
Data Warehouse - 20