Download Data Mining and Data Warehousing Henryk Maciejewski Data

Document related concepts

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Mining and Data Warehousing
Henryk Maciejewski
Data Warehousing and OLAP
Part II Data Warehousing –
Contents
• OLAP Approach to Data Analysis
• Database for OLAP = Data Warehouse
– Logical model
– Physical models (ROLAP, MOLAP, HOLAP)
• Querying multidimensional data
• DW project methodologies
Further Reading
• J. Han, M. Kamber, Data Mining: Concepts and
Techniques, Second Edition, Elsevier 2006.
• W. Inmon: Building the data warehouse, Wiley 2005.
• F. Silvers: Building and maintaining a data
warehouse, CRC Press 2008.
• www.information-management.com
From DBMS to Analytical Systems...
• The 1960s: first IT systems
• The 1970s:
• DBMS systems
• On-line transactional processing systems (OLTP)
• The 1990s:
• On-line analytical processing (OLAP), data warehousing,
data mining – Business Intelligence (BI), DSS
IT Systems Generate Data Deluge
• IT Systems in:
• Retail trade – bar codes, credit cards, …
• Banking, insurance, telecoms, healthcare, etc. etc.
• Science (biology , weather/Earth monitoring, sky surveys,...)
• Data Deluge
• WalMart:
•
•
•
•
20 million transactions per day
Mobil: ca. 100 TB of data (exploration of oil reserves)
Human Genome Project: ~GB of data
NASA Earth Observing System: 50 GB per hour (!)
DISS solar energy plant monitoring: ~ 800 numbers / 5 secs
How to Get Information out of
Data
• Efficient technologies available to gather and store data
• Simple approaches to data analysis prove inefficient
• Spreadsheet based, SQL query based, ...
• Technologies + tools needed for efficient data analysis /
knowledge extraction from data
• Hence OLAP, KDD (Knowledge Discovery in Databases), DM emerged
• Information – data in context; data that have meaning,
relevance and purpose
Various Approaches to Data
Analysis
Discovering relationships in data
E.g., Customer profiles ,…
Models to assess credit risk, etc.
Data Mining
Data Warehouse / OLAP
Multidimensional data
model: y(w1,w2,...wn)
Database for OLAP
Integrated data
(ETL – Extract-Transform-Load)
SQL
SQL queries to „raw”
data
Data Analysis Techniques – SQL
Queries
Data source
Data source
SQL
SQL
SQL
Data source
„Cross-sectional” question
Report
Programmer – DB admin
generates an SQL program
Drawbacks:
Considerable coding effort
Heavy load on OLTP servers
Multiple versions of the truth…
Data Warehouse (W. Inmon 1992)
Source data
Source data
Source data
Data Warehouse
Data Mart
Specific structure of database
OLAP / DSS
optimized for OLAP
(MDDB, „snowflake”, „star schema”,
ROLAP, MOLAP, HOLAP)
ETL: Data access
Data integration
(cleaning, transformation)
Why OLAP Technology is Becoming
Indispensable
• Getting information of out historical data
• Integration of data sources in the enterprise
• „Cross-sectional” analyses of enterprise data
→ discovering relationships / patterns in large amounts of data
→ trend analysis
→ data mining
OLAP/Data Warehouse – Key
Design Issues
• Data organization
• Multidimensional data model (facts seen as a function of dimensions)
• Physical data storage that allows for fast (online) analysis of vast data
volumes
• Data integration
• Ensure high quality of analytical data
• „Taming the data chaos”
• Single version of the truth
OLAP vs. OLTP – Different
Applications and Data Model
• OLTP
– operational data
– automation of day-to-day operations of organization:
→ phone-call billing, orders / invoices processing, banking / credit card
transactions, etc., etc.
• OLAP
– analytical data
– getting information for decision support
→ Who are our best customers (characteristics)?
→ Churn analysis
→ How does increase in sales correlate with quality of service?
OLAP vs. OLTP – Summary
Problem
OLTP
OLAP
Main applications
Automation of operations of organization:
entering data on routine day-to-day
transactions
fixed structure reports / summaries
created on regular basis (daily,
monthly, etc.)
Decision support
multidimensional statistical
analyses, forecasting, ad hoc
queries,
advanced reporting
Time horizon for
data retention
Usually short term (90 days, 1 year)
Long term data retention, to support
historic data analyses, comparative
reports, trend analysis over time
Data updates
‘On the fly’, during individual transaction
Static data, updated on regular basis
(e.g., monthly), data collected over
time (time-stamped)
Data access
Frequent access to small portions of data (a
few or tens of records)
Simple, well structured queries
Rare access involving large amounts of
data
Complex queries, ad-hoc
Schedule
• OLAP Approach to data analysis
– OLAP vs OLTP
– OLAP – data integration
• Database for OLAP = Data Warehouse
– Logical data model – multidimensionality
– Physical data models (ROLAP, MOLAP, HOLAP)
„Data chaos” – Why it is Hard to
Run Analytics Based on OLTP
• Main obstacles for building successful OLAP ‘on top’ of
transactional data:
–
–
–
–
Data awareness
Data understanding
Data variability
Data redundancy (and hence consistency)
• „Data islands” in disparate transactional systems
Data Chaos – Example
Faculty of EE
Teachers DB
Faculty of
Architecture
Tutors DB
Problems / difficulties:
→ how to find data
→ how to extract data
Notes DB
Exam results DB
→ understand the meaning
→ clean the data
Courses DB
Courses DB
Recruitment DB
Data warehouse
Business Intelligence Based on
OLTP?
• How to get to the data
in the DB?
• How to locate the right
table / column ?
• How to understand the
meaning of the data ?
• How to clean the data ?
17
Dedicated System for BI (OLAP)
• ETL (Extract Transform
Load)
– Connect to source DB
– Integrate / clean
– Transform to the
multidimensional model
• Multidimensional
model of data (facts vs.
dimensions)
Example: Multidimensional Model
Cubes:
Over-hours
Availability
Fuel consumption
Example: ETL Process
ETL for the cube
Availability
Data Warehouse – Definition
Date Warehouse – subject-oriented, integrated, time-varying, non-volatile
collection of data that is used primarily in organizational decision making.
Subject oriented – data is organized around subjects of interest to data analyst
(e.g., customer, product, supplier); transactional systems are process-oriented
(e.g., order processing).
Integrated – data warehouse integrates data from several data sources; data
characteristics (attributes) must be coded in a consistent way (e.g., consistent
coding of SEX (‘male’-’female’, ‘m’-’f’, 0-1)).
Non-volatile – data loaded into data warehouse is a ‘snapshot’ of operational data
at a specific point in time; once loaded, data in warehouse cannot be changed.
Time-varying – data elements in warehouse are time-stamped to facilitate
analysis of changes / trends over time.
Summary of This Part
• Concept of OLTP and OLAP
– Different use, different requirements for
• Data organization (data model)
• Database design
• Need for data integration
– Overcoming „data chaos”
– Ensuring high quality of analytical data in warehouse
Example: OLAP for Student Notes
23
Example: OLAP for Student Notes
Example: OLAP for Student Notes
Example: IBM Tivoli Monitoring Data
Warehouse
• Monitoring agents –
keep 24 h detailed
data
• Data Warehouse –
aggregated, timestamped data drawn
from agents
Example: IBM Tivoli Monitoring Data
Warehouse
Agent
Default attribute group
Monitoring Agent for
Windows OS
Network_Interface
NT_Processor
NT_Logical_Disk
NT_Memory
NT_Physical_Disk
NT_Server
NT_System
Monitoring Agent for
UNIX
Disk
System
Monitoring Agent for
Linux
Linux_CPU
Linux_CPU_Averages
Linux_CPU_Config
Linux_Disk
Linux_Disk_IO
Linux_Disk_Usage_Trends
Linux_IO_Ext
Linux_Network
Linux_NFS_Statistics
Monitoring Agent for
DB2
KUDDBASEGROUP00
KUDDBASEGROUP01
KUDBUFFERPOOL00
KUDINFO00
KUDTABSPACE
Schedule
• Multidimensional Model of OLAP Data
• Why OLAP Doesn’t Like Normalized DB
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
OLAP: Multidimensional Model of
Data
• OLAP = multidimensional analysis of data
• Multidimensional model of data:
– Measure as a value in multidimensional space of dimensions
– Numeric measures – objects of analysis, also referred to as facts
– Dimensions – variables on which the measure depends / that uniquely
determine the measure
• E.g., measure:
sales [$]
dimensions:
product, shop, date
OLAP: Multidimensional Model of
Data
• Dimension hierarchies, e.g.,
– Geographical hierarchy: shop – city – region – country
– Time hierarchy: day of week – week – month – year
– Product hierarchy: item – type – group
Example – Model Built in Lab
• Multidimensional model for analysis of students’ notes:
– Measure: Student’s grade (note)
– Dimensions:
•
•
•
•
•
Characteristics of students
Characteristics of teachers
Characteristics of courses (group of courses, type of courses, etc.)
Time hierarchy: calendar semester – year
Workload of students / teachers, etc.
– Various statistics will be of interest, e.g., average grade, number of
grades, std deviation, distribution,...
Useful Concepts
– Aggregation: e.g., computing total sales by year based on
more detailed data
– Drill-down: create more detailed view (i.e., decrease level
of aggregation)
– Rollup: increase level of aggregation
– Slice-and-dice: reduce dimensionality of data: fix values of
some dimensions and observe how data depends on the
remaining dimensions
Schedule
• Multidimensional Model of OLAP Data
• Why OLAP Doesn’t Like Normalized DB
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
Normalized DB (a Reminder)
• Database design for OLTP uses Entity Relationship
diagrams and normalization techniques
• Normalized DB:
–
–
–
–
–
No data redundancy
Many tables with many-to-one relationships
Optimized for easy / fast updates of data
Efficient for constantly changing data
Efficient for OLTP
Normalized DB - Example
Contact
Order item
Order ID
Order item ID
Product ID
Quantity
Product
Product ID
Product name
Product type
...
Shipment
Shipment ID
Status
Order ID
Order item ID
Customer ID
Order
Order ID
Customer ID
Order date
Sales rep ID
Task – answer the following OLAP query:
Which products were sold to a particular group of
customers within specified time frame?
Customer
Contact ID
Customer ID
Contact name
Contact type
Customer ID
Customer name
Address
City
...
Sales rep
Sales rep ID
Sales rep name
District ID
District
District ID
District name
manager
Normalized DB – Problems with
OLAP Queries
• Many ‘join’ operations on tables  low efficiency of SQL
queries
• ‘Circular join paths’ – a query can be answered in two
different ways  different results possible
• Complicated database scheme  SQL code difficult to build /
maintain
OLAP: Requirements for Database
Design
• Simplicity of database scheme
• Efficiency of multidimensional queries
• Consistency and accuracy of data
• Database schemes to meet these requirements
– Relational OLAP (ROLAP)
– Multidimensional OLAP (MOLAP)
– Hybrid OLAP (HOLAP)
Schedule
• Multidimensional Model of OLAP Data
• Why OLAP Doesn’t Like Normalized DB
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
Relational OLAP
• Warehouse data stored using a relational database server
• Multidimensional data model represented by a star-schema
database or snowflake-schema database
• Star schema:
– Single fact table
– Single table for each dimension
– A fact table entry consist of:
• Aggregate value of the measure
• Foreign keys to dimension tables (composite key of the fact table)
Relational OLAP
• Warehouse data stored using a relational database server
• Multidimensional data model represented by a star-schema
database or snowflake-schema database
• Snowflake schema:
– Variant of star schema with (some) dimension tables normalized (for
easier maintenance of dimension data)
Example – Star Schema
Product
Sales person
Product ID
Sales person ID
Name
Region
Division
Office
Date
Date ID
Date
Year
Month
Day
Sales
(fact table)
Sales person ID
Product ID
Date ID
Customer ID
Number sold
amount
Prod code
Prod name
Prod type
Prod category
Customer
Customer ID
Name
Sex
Age
Job name
Example – Snowflake Schema
Product
Sales person
Product ID
Sales person ID
Name
Region
Division
Office
Date
Date ID
Date
Year
Month
Day
Sales
(fact table)
Prod code
Prod name
Prod type
Prod category
Sales person ID
Product ID
Date ID
Customer ID
Number sold
amount
Customer
Customer ID
Name
Sex
Age
Job ID
Job Code
Job ID
Job name
Job category
…
ROLAP – Example of OLAP Query
• OLAP query:
How many products were sold to a specific group of customers in a given
time frame?
Translates into the following SQL query:
select sum(number_sold) as number_sold
from
fact_sales
a,
dimension_date
b,
dimension_customer
c
where
b.date = ’21jan2001’d
and
c.sex = ‘F’
and
a.dateID = b.dateID
and
a.customerID = c.customerID
;
Schedule
• Multidimensional Model of OLAP Data
• Why OLAP Doesn’t Like Normalized DB
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
Multidimensional OLAP
• Warehouse data stored in a multidimensional
database (MDDB)
• MDDB
– Specialized storage facility that directly reflects multidimensional
model of data
– MDDB can be viewed as an N-dimensional (hyper)cube in which values
of numerical measure (object of analysis) are stored
– Data stored in MDDB is presummarized, i.e., values stored in cross
sections of dimensions have been aggregated at the MDDB build time
(thus performance of multidimensional (OLAP) queries is high)
MDDB – Idea
• Sample base table:
– Analysis variable (fact):
note
– Classification variables
(dimensions): attributes
of students, attributes of
teachers, semester, year,
faculty, etc.
MDDB – Idea
select sum(note) as SUM, count(*) as N, spec, semester, year
from base_table
where spec='INF‘ and semester=8 and year=2001
group by spec, semester, year
MDDB – Data Aggregation
•
Each crossing of the cube
contains specified statistics for
the analysis variable(s)
•
Distributive measures can be
stored in cube, such as N,
SUM, SUMWGT, UWSUM, NMISS,
USS, MIN, MAX
•
Algebraic measures can be
computed from stored measures,
such as AVG=SUM/N
MDDB – Data Aggregation
•
Problem with holistic measures,
ie. measures for which no
algebraic aggregate function
exists.
E.g., MEDIAN
•
In large cube applications
approximate values of holistic
measures are computed using
algebraic measures
Cubes and Subcubes
• OLAP queries related to a subset of dimensions
– Result is aggregated at query time from the NWAY cube
– E.g., report on sales of all products over subsequent years – sum for all
products and all months needs to be computed at run time
– If there are many dimensions with high cardinality, this can be lengthy
• Subcubes are used to speed up performance for queries
(related to subsets of dimensions) that users are likely to ask
most frequently
Which Subcubes to Store?
Idea: find categories which will be used
most frequently, with smallest cardinality
Starnet (spiral) model: put categories in
ascending order of cardinality
Draw spiral starting with YEAR (most
frequent use anticipated, lowest
cardinality) ⇒ lists of categories =
subcubes
YEAR SECTOR REGION GRP_SUPP MONTH GRP SHOP SUPPLIER FAMILY DAY ARTICLE
YEAR SECTOR REGION GRP_SUPP MONTH GRP SHOP SUPPLIER FAMILY DAY
...
YEAR SECTOR
YEAR
Example: Building MDDB (SAS)
proc mddb data=grades out=grades_mddb
label='MDDB for analysis of grade data';
class year sem sex faculty institute exam type id_title;
var note /n sum min max;
hierarchy year sem /name=„Time Hierarchy";
hierarchy faculty institute /name=„Affiliation Hierarchy";
run;
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
SAS/MDDB(R) Server Software has been initialized.
N-way complete cells=1455.
„Time Hierarchy" computed from "NWAY" cells=10.
„Affiliation Hierarchy" computed from "NWAY" cells=26.
PROCEDURE MDDB used:
real time
1:26.54
cpu time
1:19.82
Example: Building MDDB (SAS)
• DATA – specify base table for the MDDB
• CLASS statement – specify classification variables (i.e., NWAY
cube dimensions)
• VAR statement – specify analysis variables (with statistics to
be stored in MDDB – distributive aggregate functions)
• HIERARCHY statements – specify subcubes to include in
MDDB
• Subcubes can be added / removed (ADDHIER, REMOVEHIER
statements)
ROLAP vs. MOLAP
MOLAP
ROLAP
Very high query performance
Very scalable
Easy maintenance
Lower query performance
Less scalable (fixed max size of a cube)
Design and maintenance more difficult
Problem with dimensions with very high
cardinality
Problem with constantly growing
database
„Rule of thumb”: use MOLAP as long as possible, then switch to ... HOLAP
Schedule
• Multidimensional Model of OLAP Data
• Why OLAP Doesn’t Like Normalized DB
• Relational OLAP (ROLAP)
• Multidimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP)
HOLAP Data Model
MDDB
Relational DB
Multidimensional data
provider (MDP)
viewer
Star schema
cache
Viewer (OLAP applications)
sees a logical MDDB
(or a proxy or virtual MDDB) which
is presented by the MDP
HOLAP Techniques
• „Racking” – individual MDDBs for
different values of one dimension (e.g.,
separate MDDBs for subsequent years)
• „Stacking” – different subcubes stored in
separate MDDBs or tables
(e.g.,
YEAR*COUNTRY*PRODUCT – local MDDB,
YEAR*COUNTRY*PRODUCT*MONTH – on remote
server)
year=2003
2004
2005
Multidimensional data
provider (MDP)
2006
When to Use HOLAP?
•
•
•
•
•
Too much data for one MDDB
Access to existing ROLAP solutions
Ensuring scalability with growing data volume
Flexible integration of distributed data sources
Improved performance – distributed processing of queries
• Price: HOLAP metadata must be maintained
DW Architectures – MOLAP
MDDBS Server
MOLAP Engine
RDBMS Server
RDB
ERP
Flat files
OLTP Data
Sources
ETL
DW
(ODS)
Data Layer
Create/
store
cubes
MDDBs
OLAP Application
Layer
MDX
XML/A
Presentation
Layer
DW Architectures – ROLAP
Analytical
Server
RDBMS Server
RDB
ERP
Flat files
OLTP Data
Sources
ETL
DW
(ODS)
Data Layer
MDX
XML/A
Complex
SQL
queries
OLAP Application
Layer
Presentation
Layer
MS SQL Storage Settings
• Proactive caching
– MOLAP – best performance; possible data latency (recent data changes not seen)
– ROLAP – recent changes in data seen immediately; price – poor performance
– Proactive caching: build MOLAP cache to boost performance
• ? How frequently MOLAP cube should be rebuilt
• ? Should outdated MOLAP be queried while cube is rebuilt
• ? Rebuild cubes on schedule or based on changes in data
• Minimize latency vs maximize performance
• Partitions
– Vertical: cubes based on subsets of rows in fact table
– Horizontal: cubes based on separate fact tables (e.g. for subsequent years)
MS SQL Server
Analysis Services
Storage Settings
Standarizing Access to OLAP Data
Sources – XML/A
•
•
•
XML for Analysis (XML/A)
Standard API between OLAP client and OLAP data provider
Design goals:
– Open standards based, not bound to any language or technology
– Optimized for the Web: minimize round-trip transactions and stateless
•
Client – server communicate using XML, HTTP, SOAP
Standarizing Access to OLAP Data
Sources – XML/A
• XML/A Methods:
– Discover – retrieve information (metadata) from provider, such as list of available
cubes and their properties
– Execute – request a command execution by server (MDX language command – e.g.,
OLAP MDX SELECT)
Multidimensional Expressions
Language (MDX)
• Introduced by Microsoft in OLE DB for OLAP
• Now considered de facto standard for querying multidimensional data in OLAP
cubes
• Simple form of MDX query expression:
SELECT
axis_specs ON COLUMNS,
axis_specs ON ROWS
FROM cube
WHERE slicer_specs
MDX – By Examples
• Examples based on cube built in
lab
• A tuple
– uniquelly identifies a cell in a cube
– defined by a combination of attribute
members for different attributes
– if some attribute is not specified – its
All (default) member is used
– if measure is not specified, the first
(default) measure defined in the cube
is used
MDX – Tuples
• [Measures].[Note Count] is a tuple
• To identify a cell, the All member of other
attributes was used
MDX – Tuples
• Tuple points to male (M) students in
Student Group (Studiengang) A
• Use ( ) to identify a tuple
MDX – Sets of Tuples
• Two tuples (Note Avg and Note Count)
form a set
• Use { } to identify a set of tuples
MDX – Cartesian Products
More axes 
Cartesian product
•
.Members MDX function lists members of an
attribute
•
on columns
– axis 0
on rows
– axis 1
(up to 128 axes)
MDX – Cartesian Products
•
Now set of tuples is used in Axis 0 (columns)
specification
•
Each cell is produced as an intesection of its attribute
members
MDX – Slicer Axis (WHERE)
• WHERE clause – used to specify set, tuple or
member that restrict the members returned for
rows and columns
MDX – Slicer Axis (WHERE)
• WHERE clause – used to specify set, tuple or
member that restrict the members returned for
rows and columns
MDX – Slicer Axis (WHERE)
• WHERE clause – used to specify set, tuple or
member that restrict the members returned for
rows and columns
Data Warehouse Project
Methodology(-ies)
• SAS Rapid Data Warehouse Methodology
• IBM DW / BI Project Methodology
• …
• Purpose:
– Ensure disciplined, iterative, approach in the management and
implementation of data warehousing projects
– Enable successful business and technical implementation of the data
warehouse
DW Project Methodology - Phases
•
Assessment
–
–
–
–
•
Requirements
–
–
–
•
Determine whether there exists a realistic need and opprotunity to develop a successful DW
Project definition stage (team, sponsor, criteria for success, expectations)
Initial assessment of IT infrastructure (is project feasibile?)
Outcome: formal document
Requirements gathering (in-depth interviews with business people)
Reconciliation stage (analyze gap between expectations and IT capabilities)
Outcome: Requirements Definition Document (logical and physical data model; data extraction paths from
source OLTP systems; transformations required; DW update schedule)
Desing / Implementation / deployment
–
Implement logical data model
– Build ETL processes (validate, clean, integrate)
–
–
•
•
Load data to DW
Design, implement data analysis interfaces
Train users
Review
DW Specific Requirements Remarks
• Analytical needs in company
– Types of reports, time schedules (daily / weekly etc.)
– Hierarchies of data / hierachies of reports
– Identification of data sources
• Updates of data in DW
– Data integration rules; handling missing / wrong data
– Time schedule for DW updates
• Data latency / performance
– Recent changes in OLTP seen immediately in OLAP?
– What latency is acceptable?
– OLAP query performance
Data Integration
• Analyze source OLTP systems
– Determine DBMS systems / data formats
– Select most appropriate sources / columns (cleanest)
• Analyze required integration
– Ensure the same coding conventions (‘m-w’, ‘male-female, ‘0-1’)
– Identify synonyms, homonyms, analogies
– Ensure data quality (integrity, accuracy, completeness)
• data value integrity
• data structure integrity
– Define exception handling rules / missing data handling / default values
– Finally, define data integration rule/algorithm for each variable
Example – Synonyms, Homonyms,
Analogies
• Define how to resolve name conficts between data sources /
columns:
– Homonyms: same name but different meaning, e.g., Type in one source
reffers to model of a car („AURIS”, „CLIO”, etc.), and in another source – to
category („picup”, „truck”, „passenger”, etc. )
– Synonyms: different names but the same meaning, e.g., PersonID in one
source, EmployeeCode in another
– Analogies: attributes describe the same object, but differently, e.g.,
PaymentMethod in one source refers to „cash”, „check”, „credit card”, and
in another to „VISA”, „MasterCard”, „USD” etc.
Example – Data Integrity
Specify legal relationships between data values
Employee
Name
Date of birth
Contract final date
Anniversary date
Temporary
+
+
+
o
(+ required; -- not allowed; o optional)
Number of values in a relationship
Student can have
‘Undergraduate’
‘Graduate’
0,1 or n diplomas
0
1 or n
Permanent
+
+
-+
Summary
• Build dedicated database for OLAP – data mart / warehouse
– Data integration
– Data quality assurance
• Database organization
– Multidimensional model of data
– Physical data organization
• Denormalization
• Aggregation
•
Benefits from user’s perspective
–
–
–
–
Integrated overall picture of the enterprise
Easy access to historical data
Trustworthy information returned (single version of the truth)
DSS queries with no impact on transactional systems
• DW Methodology to ensure successful implementation