Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational DBMS support for OLAP Lecture 3 By Dr DVLN Somayajulu Professor Dept of CSE National Institute of Technology Warangal E-Mail: [email protected] Page 1 Outline • Relational DBMS support for OLAP • Data Cube Demonstration in SQL • Categories of OLAP tools Page 2 Relational DBMS Support for OLAP • Group By Grouping Set < column list> – Short notation in SQL:99 for series of UNIONed queries that are common in reports • Rollup extension • Cube extension Page 3 Cross Tab report With Sub Totals Country India USA Internet 9,597 124,224 Direct Sales 61,202 638,201 Total 70,799 762,425 QUERY in SQL: SELECT channels.channel_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc='2000-09‘ AND customers.country_id=countries.country_id AND countries.country_iso_code IN (‘INDIA',‘USA') GROUP BY CUBE(channels.channel_desc, countries.country_iso_code); Page 4 Total 133,821 699,403 833,224 Page 5 ROLLUP Extension to GROUP BY • Extension of Group by clause • Easy to use and Efficient • Syntax: SELECT … GROUP BY ROLLUP (grouping_column_reference_list) When to Use ROLLUP? – Use the ROLLUP extension in tasks involving subtotals. – Suited to perform aggregation of data across hierarchical categories such as time and horizon. – For data warehouse administrators using summary tables, ROLLUP can simplify and speed up the maintenance of summary tables. Page 6 Page 7 Partial ROLLUP Extension to GROUP BY • Partial Rollup Syntax: GROUP BY expr1, ROLLUP(expr2, expr3); - Creates subtotals at 3 aggregate levels Example Query in SQL QUERY in SQL: SELECT channel_desc, calendar_month_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND customers.country_id = countries.country_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US') GROUP BY channel_desc, ROLLUP(calendar_month_desc, countries.country_iso_code); Page 8 Page 9 CUBE Extension to Group By • CUBE takes a specified set of grouping columns and creates subtotals for all of their possible combinations. • CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions. • CUBE (time, region, department), the result set will include all the values that would be included in an equivalent ROLLUP statement plus additional combinations. • If n columns are specified for a CUBE, there will be 2 to the n combinations of subtotals returned CUBE function Syntax: SELECT … GROUP BY CUBE (grouping_column_reference_list) QUERY: SELECT channel_desc, calendar_month_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels, countries WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND customers.country_id = countries.country_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US') GROUP BY CUBE(channel_desc, calendar_month_desc, countries.country_iso_code ) ; Page 11 Page 12 Partial Cube Extension to GROUP BY • Partial cube: Syntax: GROUP BY expr1, cube(expr2, expr3); - Creates subtotals at 4 aggregate levels Example Query in SQL • QUERY in SQL: SELECT channel_desc, calendar_month_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels, countries WHERE sales.time_id = times.time_id AND sales.cust_id = customers.cust_id AND customers.country_id=countries.country_id AND sales.channel_id = channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US') GROUP BY channel_desc, CUBE(calendar_month_desc, countries.country_iso_code); Page 13 Page 14 Discussion on CUBE Can we compute Sub Totals without using Cube? Page 15 GROUPING SETS Expression • To explicitly specify the set of groups that you want to • • create within a GROUP BY clause using a GROUPING SETS expression. This allows precise specification across multiple dimensions without computing the whole CUBE Example: Page 16 Variations of the Grouping Operators • • • • Partial cube Partial rollup Composite columns CUBE and ROLLUP inside a GROUPING SETS operation Page 17 Partial cube – Example SELECT channel_desc, calendar_month_desc, countries.country_iso_code, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels, countries WHERE sales.time_id = times.time_id AND sales.cust_id = customers.cust_id AND customers.country_id=countries.country_id AND sales.channel_id = channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2000-09', '2000-10') AND countries.country_iso_code IN ('GB', 'US') GROUP BY channel_desc, CUBE(calendar_month_desc, countries.country_iso_code); Page 18 Page 19 Challenges with the use of Rollup and Cube • How can you programmatically determine which • • • result set rows are subtotals, How do you find the exact level of aggregation for a given subtotal? Is there any way with ease to determine which rows are the subtotals? What happens if query results contain both stored NULL values and "NULL" values created by a ROLLUP or CUBE? Page 20 Categories of OLAP Tools • OLAP tools are categorized according to the architecture of the underlying database. • Three main categories of OLAP tools include – Multi-dimensional OLAP (MOLAP or MD-OLAP) – Relational OLAP (ROLAP), also called multi-relational OLAP Page 21 Multi-Dimensional OLAP (MOLAP) • Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. • Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application. Page 22 Multi-Dimensional OLAP (MOLAP) • Traditionally, require a tight coupling with the application layer and presentation layer. • Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs). Page 23 Typical Architecture for MOLAP Tools Page 24 MOLAP Tools - Development Issues • Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data. • Navigation and analysis of data is limited because the data is designed according to previously determined requirements. Page 25 MOLAP Tools - Development Issues • MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support. Page 26 Relational OLAP (ROLAP) • Fastest growing style of OLAP technology. • Supports RDBMS products using a metadata layer avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multidimensional views of the two-dimensional relation. Page 27 Relational OLAP (ROLAP) • To improve performance, some products use SQL engines to support complexity of multi-dimensional analysis, while others recommend, or require, the use of highly de-normalized database designs such as the star schema. Page 28 Typical Architecture for ROLAP Tools Page 29 ROLAP • A multi-dimensional user view on relational data storage using Star or Snowflake Database Schemata. Region Customer Dimension Region Dimension Customer Characteristics Customer Dimension Sales Product Dimension Country Dimension Sales Time Dimension Product Kind Star Schema Product Dimension Month Year Dimension Snowflake Schema Page 30 OLAP TOOLS • Organize facts according to multiple dimensions and they use powerful rules for combining those facts to form aggregate facts • Characteristics: – To drill down into the data – To swap the dimensions – Allow changes in the appearance of the data • Provide flexibility • Vendors who sell these tools? – Oracle with Oracle Express suite, Cognos with PowerPlay, Microstrategy with DSS agent, and many others. Page 31 Conclusion • OLAP is a user interface, not a data storage, concept • OLAP systems provide four basic functions: multi dimensional view of data, drill down, rotation and multiple view modes • Relational OLAP is an access of data for OLAP from relational database. MOLAP is the access of data for OLAP from multi dimensional database • There is no always best approach to OLAP. Best approach is defined by the application. Page 32 requirement of the