Download (OLAP).

Document related concepts

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Big data wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 34
OLAP
Transparencies
1
Chapter 34 - Objectives
 The
purpose of Online Analytical Processing
(OLAP).
 The relationship between OLAP and data
warehousing.
 The key features of OLAP applications.
2
Chapter 34 - Objectives
 How
to represent multi-dimensional data.
 The rules for OLAP tools.
 The main categories of OLAP tools.
 OLAP extensions to the SQL standard.
 How Oracle supports OLAP.
3
Business Intelligence Technologies
 Accompanying
the growth in data warehousing
is an ever-increasing demand by users for more
powerful access tools that provide advanced
analytical capabilities.
 There
are two main types of access tools
available to meet this demand, namely Online
Analytical Processing (OLAP) and data
mining.
4
Business Intelligence Technologies
 OLAP
and Data Mining differ in what they
offer the user and because of this they are
complementary technologies.
 An
environment that includes a data
warehouse (or more commonly one or more
data marts) together with tools such as OLAP
and /or data mining are collectively referred to
as Business Intelligence (BI) technologies.
5
Online Analytical Processing (OLAP)
 Original
definition - The dynamic synthesis,
analysis, and consolidation of large volumes of
multi-dimensional data, Codd (1993).
 Describes
a technology that is designed to
optimize the storing and querying of large
volumes of multi-dimensional data that is
aggregated (summarized) to various levels of
detail to support the analysis of this data.
6
Online Analytical Processing (OLAP)
 Enables
users to gain a deeper understanding
and knowledge about various aspects of their
corporate data through fast, consistent,
interactive access to a wide variety of possible
views of the data.
 Allows
users to view corporate data in such a
way that it is a better model of the true
dimensionality of the enterprise.
7
Online Analytical Processing (OLAP)
easily answer ‘who?’ and ‘what?’
questions, however, ability to answer ‘why?’
type questions distinguishes OLAP from
general-purpose query tools.
 Can
 Types
of analysis ranges from basic navigation
and browsing (slicing and dicing) to
calculations, to more complex analyses such as
time series and complex modeling.
8
OLAP Benchmarks
 OLAP
Council published an analytical
processing benchmark referred to as the APB1 (OLAP Council, 1998).
 Aim
is to measure a server’s overall OLAP
performance rather than the performance of
individual tasks.
9
OLAP Benchmarks
 APB-1
assesses the most common business
operations including:
– bulk loading of data from internal or
external data sources
– incremental loading of data from
operational systems;
– aggregation of input level data along
hierarchies;
10
OLAP Benchmarks
 APB-1
assesses the most common business
operations including (continued):
– calculation of new data based on business
models;
– time series analysis;
– queries with a high degree of complexity;
– drill-down through hierarchies;
– ad hoc queries;
– multiple online sessions.
11
OLAP Benchmarks
 OLAP
applications are judged on their ability
to provide just-in-time (JIT) information, a
core requirement of supporting effective
decision-making.
 This
requirement is more than measuring
processing performance but includes its
abilities to model complex business
relationships and to respond to changing
business requirements.
12
OLAP Benchmarks
 APB-1
uses a standard benchmark metric
called AQM (Analytical Queries per Minute).
 AQM
represents the number of analytical
queries processed per minute including data
loading and computation time. Thus, the AQM
incorporates data loading performance,
calculation performance, and query
performance into a singe metric.
13
OLAP Benchmarks
 Publication
of APB-1 benchmark results must
include both the database schema and all code
required for executing the benchmark.
 An
essential requirement of all OLAP
applications is the ability to provide users with
JIT information, which is necessary to make
effective decisions about an organization's
strategic directions.
14
OLAP Applications
 JIT
information is computed data that usually
reflects complex relationships and is often
calculated on the fly. Also as data relationships
may not be known in advance, the data model
must be flexible.
15
Examples of OLAP applications in various
functional areas
16
OLAP Applications
 Although
OLAP applications are found in
widely divergent functional areas, they all have
the following key features:
– multi-dimensional views of data
– support for complex calculations
– time intelligence
17
OLAP Applications - multi-dimensional
views of data
requirement of building a ‘realistic’
business model.
 Core
 Provides
basis for analytical processing
through flexible access to corporate data.
 The
underlying database design that provides
the multi-dimensional view of data should treat
all dimensions equally.
18
OLAP Applications - support for complex
calculations
 Must
provide a range of powerful
computational methods such as that required
by sales forecasting, which uses trend
algorithms such as moving averages and
percentage growth.
 Mechanisms
for implementing computational
methods should be clear and non-procedural.
19
OLAP Applications – time intelligence
 Key
feature of almost any analytical
application as performance is almost always
judged over time.
 Time
hierarchy is not always used in the same
manner as other hierarchies.
 Concepts
such as year-to-date and period-overperiod comparisons should be easily defined.
20
Multi-dimensional Data and OLAP cubes
 Multi-dimensional
data is facts (numeric
measurements) such as property sales
revenue data and the association of this
data with dimensions such as location (of
the property) and time (of the property
sale).
 Which
is the best representation of multidimensional data: relational table, matrix
or data cube?
21
Multi-dimensional Data as 3-field Table
versus 2-D Matrix
22
Multi-dimensional Data as 4-field Table
versus 3-D Cube
23
Multi-dimensional Data as series of 3-D
Cubes
24
Multi-dimensional data and OLAP cubes
 We
consider cubes as solid 3-D structures
with equal sides. However, the OLAP
cube is n-dimensional structure (with
sides that need not be equal).

Alternative representation for ndimensional data is to consider a data
cube as a lattice of cuboids. Each cuboid
represents a subset of the given
dimensions.
25
Multi-dimensional data and OLAP cubes
0-D cuboid (highest-level)
all
1-D cuboid
time
location
type
office
2-D cuboid
time, location
time, type
time, office
location, type
location, office
type, office
3-D cuboid
time, location, type
time, location, office
time, type, office
location, type, office
4-D cuboid (lowest-level)
time, location, type, office
26
Dimensionality Hierarchy
 The
lattice of cuboids does not show the
hierarchies that are commonly associated
with dimensions.

A dimensional hierarchy defines mappings
from a set of lower-level concepts to higher
level concepts.
27
Dimensionality Hierarchy
country
year
region
2-D data
city
quarter
season
area
month
week
zipCode
day
28
Dimensional Operations
 The
analytical operations that can be performed
on data cubes include:
– Roll-up
– Drill-down
– Slice and Dice
– Pivot
29
Dimensional Operations
 Roll-up
performs aggregations on the data by
moving up the dimensional hierarchy or by
dimensional reduction e.g. 4-D sales data to 3-D
sales data.
 Drill-down is the reverse of roll-up and involves
revealing the detailed data that forms the
aggregated data. Drill-down can be performed by
moving down the dimensional hierarchy or by
dimensional introduction e.g. 3-D sales data to 4-D
sales data.
30
Dimensional Operations
 Slice
and dice - ability to look at data from
different viewpoints. The slice operation
performs a selection on one dimension of the data
whereas dice uses two or more dimensions. For
example a slice of sales revenue (type = ‘Flat’)
and a dice (type = ‘Flat’ and time = ‘Q1’).
31
Dimensional Operations
 Pivot
- ability to rotate the data to provide an
alternative view of the same data e.g. sales
revenue data displayed using the location (city) as
x-axis against time (quarter) as the y-axis can be
rotated so that time (quarter) is the x-axis against
location (city) is the y-axis.
32
OLAP Tools
 There
are many varieties of OLAP tools
available in the marketplace.
 This
choice has resulted in some confusion with
much debate regarding what OLAP actually
means to a potential buyer and in particular
what are the available architectures for OLAP
tools.
33
Codd’s Rules for OLAP Systems
 In
1993, E.F. Codd formulated twelve rules as
the basis for selecting OLAP tools.
34
Codd’s Rules for OLAP Systems
 Multi-dimensional
conceptual view
Transparency
 Accessibility
 Consistent reporting performance
 Client-server architecture
 Generic dimensionality

35
Codd’s rules for OLAP
 Dynamic
sparse matrix handling
 Multi-user support
 Unrestricted cross-dimensional operations
 Intuitive data manipulation
 Flexible reporting
 Unlimited dimensions and aggregation levels
36
Codd’s Rules for OLAP Systems
 There
are proposals to re-defined or extended
the rules. For example to also include
– Comprehensive database management tools
– Ability to drill down to detail (source
record) level
– Incremental database refresh
– SQL interface to the existing enterprise
environment
37
Categories of OLAP Tools
 OLAP
tools are categorized according to the
architecture used to store and process multidimensional data.

There are three main categories:
– Multi-dimensional OLAP (MOLAP)
– Relational OLAP (ROLAP)
– Hybrid OLAP (HOLAP)
38
Multi-dimensional OLAP (MOLAP)
 Use
specialized data structures and multidimensional Database Management Systems
(MDDBMSs) to organize, navigate, and
analyze data.
 Data
is typically aggregated and stored
according to predicted usage to enhance query
performance.
39
Multi-dimensional OLAP (MOLAP)
 Use
array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data
management.

Provides excellent performance when data is
used as designed, and the focus is on data for a
specific decision-support application.
40
Multi-dimensional OLAP (MOLAP)
 Traditionally,
require a tight coupling with the
application layer and presentation layer.

Recent trends segregate the OLAP from the
data structures through the use of published
application programming interfaces (APIs).
41
Typical Architecture for MOLAP Tools
42
MOLAP Tools - Development Issues
 Underlying
data structures are limited in their
ability to support multiple subject areas and to
provide access to detailed data.
 Navigation
and analysis of data is limited
because the data is designed according to
previously determined requirements.
43
MOLAP Tools - Development Issues
 MOLAP
products require a different set of
skills and tools to build and maintain the
database, thus increasing the cost and
complexity of support.
44
Relational OLAP (ROLAP)
 Fastest-growing
style of OLAP technology due
to requirements to analyze ever-increasing
amounts of data and the realization that users
cannot store all the data they require in
MOLAP databases.
45
Relational OLAP (ROLAP)
 Supports
RDBMS products using a metadata
layer - avoids need to create a static multidimensional data structure - facilitates the
creation of multiple multi-dimensional views of
the two-dimensional relation.
46
Relational OLAP (ROLAP)
 To
improve performance, some products use
SQL engines to support the complexity of
multi-dimensional analysis, while others
recommend, or require, the use of highly
denormalized database designs such as the star
schema.
47
Typical Architecture for ROLAP Tools
48
ROLAP Tools - Development Issues
 Performance
problems associated with the
processing of complex queries that require
multiple passes through the relational data.
 Middleware
to facilitate the development of
multi-dimensional applications. (Software that
converts the two-dimensional relation into a
multi-dimensional structure).
49
ROLAP Tools - Development Issues
 Development
of an option to create persistent,
multi-dimensional structures with facilities to
assist in the administration of these structures.
50
Hybrid OLAP (HOLAP)
 Provide
limited analysis capability, either
directly against RDBMS products, or by using
an intermediate MOLAP server.
 Deliver
selected data directly from the DBMS
or via a MOLAP server to the desktop (or local
server) in the form of a datacube, where it is
stored, analyzed, and maintained locally.
51
Hybrid OLAP (HOLAP)
 Promoted
as being relatively simple to install
and administer with reduced cost and
maintenance.
52
Typical Architecture for HOLAP Tools
53
HOLAP Tools - Development Issues
 Architecture
results in significant data
redundancy and may cause problems for
networks that support many users.
 Ability
of each user to build a custom datacube
may cause a lack of data consistency among
users.
 Only
a limited amount of data can be
efficiently maintained.
54
Desktop OLAP (DOLAP)
 Store
the OLAP data in client-based files and
support multi-dimensional processing using a
client multi-dimensional engine.
 Requires
that relatively small extracts of data
are held on client machines. They may be
distributed in advance, or created on demand
(possibly through the Web).
55
OLAP Extensions to SQL
 Advantages
of SQL include that it is easy to
learn, non-procedural, free-format, DBMSindependent, and that it is a recognized
international standard.

However, major limitation of SQL is the
inability to answer routinely asked business
queries such as computing the percentage
change in values between this month and a
year ago or to compute moving averages,
cumulative sums, and other statistical
functions.
56
OLAP Extensions to SQL
 Answer
is ANSI adopted a set of OLAP
functions as an extension to SQL to enable
these calculations as well as many others that
used to be impossible or even impractical
within SQL.

IBM and Oracle jointly proposed these
extensions early in 1999 and they now form
part of the current SQL standard, namely
SQL: 2008.
57
OLAP Extensions to SQL - RISQL
 The
extensions are collectively referred to as
the ‘OLAP package’ and are described as
follows:
– Feature T431, ‘Extended Grouping
capabilities’
– Feature T611, ‘Extended OLAP operators’
58
Extended Grouping Capabilities
 Aggregation
is a fundamental part of OLAP. To
improve aggregation capabilities the SQL
standard provides extensions to the GROUP BY
clause such as the ROLLUP and CUBE functions.
59
Extended Grouping Capabilities
 ROLLUP
supports calculations using
aggregations such as SUM, COUNT, MAX, MIN,
and AVG at increasing levels of aggregation, from
the most detailed up to a grand total.
 CUBE
is similar to ROLLUP, enabling a single
statement to calculate all possible combinations of
aggregations. CUBE can generate the information
needed in cross-tabulation reports with a single
query.
60
Extended Grouping Capabilities
 ROLLUP
and CUBE extensions specify exactly
the groupings of interest in the GROUP BY clause
and produces a single result set that is equivalent
to a UNION ALL of differently grouped rows.
61
Extended Grouping Capabilities
 ROLLUP
Extension to GROUP BY
– enables a SELECT statement to calculate
multiple levels of subtotals across a specified
group of dimensions. ROLLUP appears in the
GROUP BY clause in a SELECT statement
using the following format:
SELECT ... GROUP BY ROLLUP(columnList)
62
Extended Grouping Capabilities
– ROLLUP creates subtotals that roll up from
the most detailed level to a grand total,
following a column list specified in the
ROLLUP clause.
– ROLLUP first calculates the standard
aggregate values specified in the GROUP BY
clause and then creates progressively higher
level subtotals, moving from right to left
through the column list until finally completing
with a grand total.
63
Extended Grouping Capabilities
– ROLLUP creates subtotals at n + 1 levels,
where n is the number of grouping columns.
For instance, if a query specifies ROLLUP on
grouping columns of propertyType,
yearMonth, and city (n = 3), the result set will
include rows at 4 aggregation levels.
64
Example - Using the ROLLUP Group
Function
 Show
the totals for sales of flats or houses by
branch offices located in Aberdeen, Edinburgh, or
Glasgow for the months of August and September
of 2008.
65
Example - Using the ROLLUP Group
Function
SELECT propertyType, yearMonth, city, SUM(saleAmount) AS
sales
FROM Branch, PropertyFor Sale, PropertySale
WHERE Branch.branchNo = PropertySale.branchNo
AND PropertyForSale.propertyNo = PropertySale.propertyNo
AND PropertySale.yearMonth IN ('2008-08', '2008-09')
AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’)
GROUP BY ROLLUP(propertyType, yearMonth, city);
66
Example - Using the ROLLUP Group
Function
67
Extended Grouping Capabilities
 CUBE
Extension to GROUP BY
– CUBE takes a specified set of grouping
columns and creates subtotals for all of the
possible combinations. CUBE appears in
the GROUP BY clause in a SELECT
statement using the following format:
SELECT ... GROUP BY CUBE(columnList)
68
Extended Grouping Capabilities
– CUBE generates all the subtotals that could
be calculated for a data cube with the
specified dimensions.
– CUBE can be used in any situation
requiring cross-tabular reports. The data
needed for cross-tabular reports can be
generated with a single SELECT using
CUBE. Like ROLLUP, CUBE can be
helpful in generating summary tables.
69
Extended Grouping Capabilities
 CUBE
is typically most suitable in queries that
use columns from multiple dimensions rather
than columns representing different levels of a
single dimension.
70
Example - Using the CUBE Group Function
 Show
all possible subtotals for sales of
properties by branches offices in Aberdeen,
Edinburgh, and Glasgow for the months of
August and September of 2008.
71
Example - Using the CUBE Group Function
SELECT propertyType, yearMonth, city, SUM(saleAmount)
AS sales
FROM Branch, PropertyFor Sale, PropertySale
WHERE Branch.branchNo = PropertySale.branchNo
AND PropertyForSale.propertyNo =
PropertySale.propertyNo
AND PropertySale.yearMonth IN ('2008-08', '2008-09')
AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’)
GROUP BY CUBE(propertyType, yearMonth, city);
72
Example - Using the CUBE Group Function
73
Elementary OLAP Operators
 Supports
a variety of operations such as
rankings and window calculations.
 Ranking
functions include cumulative
distributions, percent rank, and N-tiles.
 Windowing
allows the calculation of
cumulative and moving aggregations using
functions such as SUM, AVG, MIN, and
COUNT.
74
Elementary OLAP Operators
 Ranking
Functions
– Computes the rank of a record compared to
other records in the dataset based on the
values of a set of measures. There are
various types of ranking functions,
including RANK and DENSE_RANK. The
syntax for each ranking function is:
RANK( ) OVER (ORDER BY columnList)
DENSE_RANK( ) OVER (ORDER BY
columnList)
75
Elementary OLAP Operators
 The
difference between RANK and
DENSE_RANK is that DENSE_RANK leaves
no gaps in the sequential ranking sequence
when there are ties for a ranking.
76
Example - Using the RANK and DENSE_RANK
Functions
 Rank
the total sales of properties for branch
offices in Edinburgh.
SELECT branchNo, SUM(saleAmount) AS sales,
RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS
ranking,
DENSE_RANK() OVER (ORDER BY SUM(saleAmount))
DESC AS dense_ranking
FROM Branch, PropertySale
WHERE Branch.branchNo = PropertySale.branchNo
AND Branch.city = ‘Edinburgh’
GROUP BY(branchNo);
77
Example - Using the RANK and
DENSE_RANK Functions
78
Elementary OLAP Operators
 Supports
a variety of operations such as
rankings and window calculations.
 Ranking
functions include cumulative
distributions, percent rank, and N-tiles.
 Windowing
allows the calculation of
cumulative and moving aggregations using
functions such as SUM, AVG, MIN, and
COUNT.
79
Elementary OLAP Operators
 Windowing
Calculations
– Can be used to compute cumulative,
moving, and centered aggregates. They
return a value for each row in the table,
which depends on other rows in the
corresponding window.
80
Elementary OLAP Operators
 Windowing
Calculations
– Can be used to compute cumulative,
moving, and centered aggregates. They
return a value for each row in the table,
which depends on other rows in the
corresponding window.
– These aggregate functions provide access to
more than one row of a table without a selfjoin and can be used only in the SELECT
and ORDER BY clauses of the query.
81
Example - Using Windowing Calculations
 Show
the monthly figures and three-month
moving averages and sums for property sales at
branch office B003 for the first six months of
2008.
82
Example - Using Windowing Calculations
SELECT yearMonth, SUM(saleAmount) AS monthlySales,
AVG(SUM(saleAmount))
OVER (ORDER BY yearMonth, ROWS 2 PRECEDING) AS
3-month moving avg,
SUM(SUM(salesAmount)) OVER (ORDER BY yearMonth
ROWS 2 PRECEDING)
AS 3-month moving sum
FROM PropertySale
WHERE branchNo = ‘B003’
AND yearMonth BETWEEN ('2008-01' AND '2008-06’)
GROUP BY yearMonth
ORDER BY yearMonth;
83
Example - Using Windowing Calculations
84