Download Relational DBMS support for OLAP Lecture 3 By Dr DVLN

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Relational DBMS support for OLAP
Lecture 3
By
Dr DVLN Somayajulu
Professor
Dept of CSE
National Institute of Technology
Warangal
E-Mail: [email protected]
Page 1
Outline
• Relational DBMS support for OLAP
• Data Cube Demonstration in SQL
• Categories of OLAP tools
Page 2
Relational DBMS Support for OLAP
• Group By Grouping Set < column list>
– Short notation in SQL:99 for series of
UNIONed queries that are common in reports
• Rollup extension
• Cube extension
Page 3
Cross Tab report With Sub Totals
Country
India
USA
Internet
9,597 124,224
Direct Sales 61,202 638,201
Total
70,799 762,425
QUERY in SQL:
SELECT channels.channel_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND
sales.channel_id= channels.channel_id AND
channels.channel_desc IN ('Direct Sales', 'Internet') AND
times.calendar_month_desc='2000-09‘ AND
customers.country_id=countries.country_id AND
countries.country_iso_code IN (‘INDIA',‘USA')
GROUP BY CUBE(channels.channel_desc, countries.country_iso_code);
Page 4
Total
133,821
699,403
833,224
Page 5
ROLLUP Extension to GROUP BY
• Extension of Group by clause
• Easy to use and Efficient
• Syntax: SELECT … GROUP BY ROLLUP
(grouping_column_reference_list)
When to Use ROLLUP?
– Use the ROLLUP extension in tasks involving subtotals.
– Suited to perform aggregation of data across hierarchical
categories such as time and horizon.
– For data warehouse administrators using summary tables,
ROLLUP can simplify and speed up the maintenance of
summary tables.
Page 6
Page 7
Partial ROLLUP Extension to
GROUP BY
• Partial Rollup
Syntax: GROUP BY expr1, ROLLUP(expr2, expr3);
- Creates subtotals at 3 aggregate levels
Example Query in SQL
QUERY in SQL:
SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id
AND customers.country_id = countries.country_id
AND sales.channel_id= channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND countries.country_iso_code IN ('GB', 'US')
GROUP BY channel_desc, ROLLUP(calendar_month_desc, countries.country_iso_code);
Page 8
Page 9
CUBE Extension to Group By
• CUBE takes a specified set of grouping columns and
creates subtotals for all of their possible combinations.
• CUBE generates all the subtotals that could be calculated
for a data cube with the specified dimensions.
• CUBE (time, region, department), the result set will include
all the values that would be included in an equivalent
ROLLUP statement plus additional combinations.
• If n columns are specified for a CUBE, there will be 2 to the
n combinations of subtotals returned
CUBE function
Syntax:
SELECT … GROUP BY CUBE
(grouping_column_reference_list)
QUERY:
SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND
sales.channel_id= channels.channel_id AND
customers.country_id = countries.country_id AND
channels.channel_desc IN ('Direct Sales', 'Internet') AND
times.calendar_month_desc IN ('2000-09', '2000-10') AND
countries.country_iso_code IN ('GB', 'US')
GROUP BY CUBE(channel_desc, calendar_month_desc, countries.country_iso_code ) ;
Page 11
Page 12
Partial Cube Extension to GROUP
BY
• Partial cube:
Syntax: GROUP BY expr1, cube(expr2, expr3);
- Creates subtotals at 4 aggregate levels
Example Query in SQL
•
QUERY in SQL:
SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id = times.time_id
AND sales.cust_id = customers.cust_id
AND customers.country_id=countries.country_id
AND sales.channel_id = channels.channel_id
AND channels.channel_desc IN ('Direct Sales', 'Internet')
AND times.calendar_month_desc IN ('2000-09', '2000-10')
AND countries.country_iso_code IN ('GB', 'US')
GROUP BY channel_desc, CUBE(calendar_month_desc, countries.country_iso_code);
Page 13
Page 14
Discussion on CUBE
Can we compute Sub Totals without using Cube?
Page 15
GROUPING SETS Expression
• To explicitly specify the set of groups that you want to
•
•
create within a GROUP BY clause using a
GROUPING SETS expression.
This allows precise specification across multiple
dimensions without computing the whole CUBE
Example:
Page 16
Variations of the Grouping Operators
•
•
•
•
Partial cube
Partial rollup
Composite columns
CUBE and ROLLUP inside a GROUPING SETS
operation
Page 17
Partial cube – Example
SELECT channel_desc, calendar_month_desc, countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id = times.time_id AND
sales.cust_id = customers.cust_id AND
customers.country_id=countries.country_id AND
sales.channel_id = channels.channel_id AND
channels.channel_desc IN ('Direct Sales', 'Internet') AND
times.calendar_month_desc IN ('2000-09', '2000-10') AND
countries.country_iso_code IN ('GB', 'US')
GROUP BY channel_desc, CUBE(calendar_month_desc, countries.country_iso_code);
Page 18
Page 19
Challenges with the use of Rollup and Cube
• How can you programmatically determine which
•
•
•
result set rows are subtotals,
How do you find the exact level of aggregation for a
given subtotal?
Is there any way with ease to determine which rows
are the subtotals?
What happens if query results contain both stored
NULL values and "NULL" values created by a
ROLLUP or CUBE?
Page 20
Categories of OLAP Tools
• OLAP tools are categorized according to the
architecture of the underlying database.
• Three main categories of OLAP tools include
– Multi-dimensional OLAP (MOLAP or MD-OLAP)
– Relational OLAP (ROLAP), also called multi-relational
OLAP
Page 21
Multi-Dimensional OLAP (MOLAP)
• Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data management.
• Provides excellent performance when data is used
as designed, and the focus is on data for a specific
decision-support application.
Page 22
Multi-Dimensional OLAP (MOLAP)
• Traditionally, require a tight coupling with the
application layer and presentation layer.
• Recent trends segregate the OLAP from the data
structures through the use of published application
programming interfaces (APIs).
Page 23
Typical Architecture for MOLAP Tools
Page 24
MOLAP Tools - Development Issues
• Underlying data structures are limited in their ability
to support multiple subject areas and to provide
access to detailed data.
• Navigation and analysis of data is limited because
the data is designed according to previously
determined requirements.
Page 25
MOLAP Tools - Development Issues
• MOLAP products require a different set of skills and
tools to build and maintain the database, thus
increasing the cost and complexity of support.
Page 26
Relational OLAP (ROLAP)
• Fastest growing style of OLAP technology.
• Supports RDBMS products using a metadata layer avoids need to create a static multi-dimensional data
structure - facilitates the creation of multiple multidimensional views of the two-dimensional relation.
Page 27
Relational OLAP (ROLAP)
• To improve performance, some products use SQL
engines to support complexity of multi-dimensional
analysis, while others recommend, or require, the
use of highly de-normalized database designs such
as the star schema.
Page 28
Typical Architecture for ROLAP Tools
Page 29
ROLAP
• A multi-dimensional user view on relational data
storage using Star or Snowflake Database
Schemata.
Region
Customer
Dimension
Region
Dimension
Customer
Characteristics
Customer
Dimension
Sales
Product
Dimension
Country
Dimension
Sales
Time
Dimension
Product
Kind
Star Schema
Product
Dimension
Month
Year
Dimension
Snowflake Schema
Page 30
OLAP TOOLS
•
Organize facts according to multiple dimensions and
they use powerful rules for combining those facts to
form aggregate facts
•
Characteristics:
– To drill down into the data
– To swap the dimensions
– Allow changes in the appearance of the data
•
Provide flexibility
•
Vendors who sell these tools?
– Oracle with Oracle Express suite, Cognos with
PowerPlay, Microstrategy with DSS agent, and many
others.
Page 31
Conclusion
• OLAP is a user interface, not a data storage, concept
• OLAP systems provide four basic functions: multi
dimensional view of data, drill down, rotation and
multiple view modes
• Relational OLAP is an access of data for OLAP from
relational database. MOLAP is the access of data for
OLAP from multi dimensional database
• There is no always best approach to OLAP. Best
approach
is defined
by the
application.
Page 32
requirement
of
the