Download What is Data Warehouse?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Forecasting wikipedia , lookup

3D optical data storage wikipedia , lookup

Data analysis wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Database model wikipedia , lookup

Transcript
IIO30120 Database Design
Data warehouse design
based on the book Hovi, Huotari, Lahdenmäki:
Tietokantojen suunnittelu & indeksointi, Docendo
(2003, 2005) chapter 8
© Jouni Huotari & Ari Hovi
18.10.2015
Some slides from
http://www.slideshare.net/idnats/datawarehousing-and-data-mining-presentation-725476
For discussion
• What is a data warehouse?
• How the term Business Intelligence relates to data
warehousing?
• How data mart differs from data warehouse?
• Why OLAP (online analytical processing) is
important?
• What is the basic idea behind ETL?
18.10.2015
© Jouni Huotari & Ari Hovi
2
What is Data Warehouse?
• Defined in many different ways, but not rigorously.
 A decision support database that is maintained separately from the
organization’s operational database
 Support information processing by providing a solid platform of consolidated,
historical data for analysis.
• “A data warehouse is a subject-oriented, integrated,
time-variant, and nonvolatile collection of data in
support of management’s decision-making
process.”—W. H. Inmon
• Data warehousing: The process of constructing and
using data warehouses
Basic design principles for data
warehouses
To support fast summary queries, analysis, and reporting
• This is often difficult in operative databases, but some solutions
exist
• can you mention any solutions?
It is important to
• Maintain history for seeing trends etc.
• Clean up and make data consistent from different data sources
• Make the structure clear and understandable
18.10.2015
© Jouni Huotari & Ari Hovi
4
Example: a producer wants to know….
Which are our
lowest/highest margin
customers ?
What is the most
effective distribution
channel?
What product prom-otions have the biggest
impact on revenue?
18.10.2015
5
Who are my customers
and what products
are they buying?
Which customers
are most likely to go
to the competition ?
What impact will
new products/services
have on revenue
and margins?
back room and front room of a data warehouse
18.10.2015
© Jouni Huotari & Ari Hovi
© Kimball,
Caserta:
The Data
Warehouse
6ETL Toolkit
(2004)
Data Warehouse vs. Operational DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking, manufacturing,
payroll, registration, accounting, etc.
• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making
• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
7
Conceptual Modeling of Data Warehouses
• Star schema: A fact table in the middle connected to a
set of dimension tables
• Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake
• Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation
• Dimensions describe who, what, when, where and why
for the facts.
8
Octobe
r 18,
Example of Star Schema
Another example of Star Schema
time
time_key
day
day_of_the_week
month
quarter
year
Sales Fact Table
time_key
item_key
units_sold
branch_key
branch_name
branch_type
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
branch_key
location_key
branch
10
item
location
location_key
street
city
state_or_province
country
Octobe
r 18,
Example of Snowflake Schema
time
time_key
day
day_of_the_week
month
quarter
year
Sales Fact Table
time_key
item_key
branch_key
location_key
branch
branch_key
branch_name
branch_type
Measures
11
item
units_sold
dollars_sold
avg_sales
item_key
item_name
brand
type
supplier_key
location
supplier
supplier_key
supplier_type
location_key
street
city
city_key
city_key
city
state_or_province
country
Octobe
r 18,
Example of Fact Constellation
time
time_key
day
day_of_the_week
month
quarter
year
branch
item
Sales Fact Table
time_key
item_key
branch_key
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
Measures
12
item_key
item_name
brand
type
supplier_type
Shipping Fact Table
time_key
item_key
shipper_key
from_location
location
location_key
street
city
province_or_state
country
to_location
dollars_cost
units_shipped
shipper
shipper_key
shipper_name
Octobe
location_key
r 18,
Multidimensional Data
• Sales volume as a function of product, month,
and region
•Dimensions: Product, Location, Time
•Hierarchical summarization paths
Industry Region
Year
Product
Category Country Quarter
Product
City
Office
13
Month Week
Day
Month
Octobe
r 18,
A Sample Data Cube
2Qtr
3Qtr
4Qtr
sum
U.S.A
Canada
Mexico
sum
Country
TV
PC
VCR
sum
1Qtr
Date
Total annual sales
of TV in U.S.A.
Browsing a Data Cube
• Visualization
• OLAP
capabilities
• Interactive
manipulation
15
Exercises
• Install http://www.olapcube.com/ to your virtual
machine and play with the dimensions, create a
cube and examine the result (dashboard)
• http://www.olapcube.com/help/writer/panorama/
• http://www.pentaho.com/testdrive
• http://www.databaseanswers.org/downloads/Data_Ware
housing_by_Example.pdf
• Example (real) data: http://www.gapminder.org/data/
18.10.2015
© Jouni Huotari & Ari Hovi
16
What is Business Intelligence (BI)?
• BI refers to skills, technologies, applications and practices
used to help a business acquire a better understanding of
its commercial context (Wikipedia)
• The goal is to gain insight into the business by bringing
together data, formatting it in a way that enables better
analysis, and then providing tools that give users power—
not just to examine and explore the data, but to quickly
understand it. (Business Intelligence with Microsoft Office
PerformancePoint Server)
18.10.2015
17
More information from wikipedia:
• http://en.wikipedia.org/wiki/Data_warehouse
• http://en.wikipedia.org/wiki/Data_mart
• http://en.wikipedia.org/wiki/Star_schema
• http://en.wikipedia.org/wiki/Snowflake_schema
• http://en.wikipedia.org/wiki/OLAP
18.10.2015
© Jouni Huotari & Ari Hovi
18