Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On-Line Analytical Processing Week 8 Week 8 MIE253-Consens 1 Schedule Week Date Lecture Topic 1 Jan 9 Introduction to Data Management This week’s reading: 2 Jan 16 The Relational Model 3 Jan. 23 Constraints and SQL DDL Chapter 5 4 Jan. 30 SQL DML, DB Applications, JDBC 5 Feb 6 JDBC, DDL (Views, Access Control) 6 Feb 13 Relational Algebra, Advanced SQL - Feb 20 [Reading Week] 7 Feb 27 Review and Midterm (Mar 1) 8 Mar 5 OLAP 9 Mar 12 ER Conceptual Modelling 10 Mar 19 Normalization 11 Mar 26 XML and Data Integration 12 Apr 2 Transactions and the Internet, Query Processing 13 Apr 9 Final Review Week 8 MIE253-Consens Chapter 17 2 Aggregates Functions that operate on sets: COUNT, SUM, AVG, MAX, MIN Produce numbers (not tables) Not part of relational algebra (but not hard to add) Note: COUNT counts NULLs like any other value; other aggregates ignore NULLs SELECT COUNT(*) FROM Professor P Week 8 MIE253-Consens SELECT MAX (Salary) FROM Employee E 3 Duplicate elimination in Aggregates Count the number of courses taught in S2010 SELECT COUNT (T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2010’ But if multiple sections of same course are taught, use: SELECT COUNT (DISTINCT T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2010’ Week 8 MIE253-Consens 4 Grouping But how do we compute the number of courses taught in S2010 per professor? A separate query for each professor SELECT COUNT(T.CrsCode) FROM Teaching T WHERE T.Semester = ‘S2010’ AND T.ProfId = 123456789 SQL defines a special grouping operator T.ProfId, COUNT(T.CrsCode) Teaching T T.Semester = ‘S2010’ GROUP BY T.ProfId SELECT FROM WHERE Week 8 MIE253-Consens 5 GROUP BY - Example Transcript (StudId, Grade) (StudId, AVG(Grade), COUNT(*)) 1234 1234 1234 1234 1234 78 4 Groups for each student the average grade the number of courses SELECT T.StudId, AVG(T.Grade), COUNT (*) FROM Transcript T GROUP BY T.StudId Week 8 MIE253-Consens 6 GROUP BY Evaluation Week 8 MIE253-Consens 7 Aggregate View - Example CREATE VIEW StAvg (StudId, AvgGrade) AS SELECT T.StudId, AVG (T.Grade) FROM Transcript T GROUP BY T.StudId SELECT S.Name, C.AvgGrade FROM StAvg C, Student S WHERE C.StudId = S.StudId AND C.AvgGrade > 80 Week 8 MIE253-Consens 8 HAVING Clause Eliminates unwanted groups (analogous to WHERE clause, but works on groups instead of individual tuples) HAVING condition is constructed from attributes of GROUP BY list and aggregates on attributes not in that list SELECT T.StudId, AVG(T.Grade) AS AvgGrade, COUNT (*) AS NumCrs FROM Transcript T WHERE T.CrsCode LIKE ‘MIE%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 80 Week 8 MIE253-Consens 9 Evaluation of GROUP BY HAVING Week 8 MIE253-Consens 10 ORDER BY Clause Causes rows to be output in a specified order SELECT T.StudId, COUNT (*) AS NumCrs, AVG(T.Grade) AS AvgGrade FROM Transcript T WHERE T.CrsCode LIKE ‘MIE%’ GROUP BY T.StudId HAVING AVG (T.Grade) > 3.5 ORDER BY DESC AvgGrade, ASC StudId Descending Week 8 Ascending MIE253-Consens 11 Summary: Aggregation in SQL Aggregate functions Grouping HAVING Displaying ordered results Week 8 GROUP BY Filtering groups COUNT, SUM, AVG, MAX, MIN ORDER BY MIE253-Consens 12 OLTP Compared With OLAP On Line Transaction Processing – OLTP Maintains a database that is an accurate model of some real-world enterprise. Supports day-to-day operations. Characteristics: On Line Analytic Processing – OLAP Uses information in database to guide strategic decisions. Characteristics: Week 8 Short simple transactions Relatively frequent updates Transactions access only a small fraction of the database Complex queries Infrequent updates Transactions access a large fraction of the database Data need not be up-to-date MIE253-Consens 13 The Internet Grocer OLTP-style transaction: Mary Smith, from Etobicoke, just bought a box of tomatoes; charge his account; deliver the tomatoes from our Mississauga warehouse; decrease our inventory of tomatoes from that warehouse OLAP-style report: Week 8 What are the average monthly sales of tomatoes in the GTA for the years 2007 and 2008? MIE253-Consens 14 Evolution of OLAP Applications Traditional OLAP (Decision Support, MIS) Uses data the enterprise gathers in its usual activities, perhaps in its OLTP system Queries are ad hoc, designed and carried out by a variety of users (e.g., unit managers) Newer Applications (e.g., e-commerce) Enterprise actively gathers data it wants, perhaps purchasing it Queries are sophisticated, designed by professionals, and used in more sophisticated ways Week 8 Write a query to report the average monthly sales of tomatoes in the GTA for the years 2002 and 2003. Prepare a profile of the grocery purchases of Mary Smith for the years 2007 and 2008 (personalized marketing, recommend a specific cart to Mary) MIE253-Consens 15 Data Warehouses OLAP and data mining databases are frequently stored on special servers called data warehouses (and data marts): Week 8 Can accommodate the huge amount of data generated by OLTP systems Allow OLAP queries and data mining to be run off-line so as not to impact the performance of OLTP MIE253-Consens 16 Fact Tables Many OLAP applications are based on a fact table For example, a supermarket application might be based on a table Sales (Market_Id, Product_Id, Time_Id, Sales_Amt) The table can be viewed as multidimensional Week 8 Market_Id, Product_Id, Time_Id are the dimensions that represent specific supermarkets, products, and time intervals Sales_Amt is a function of the other three MIE253-Consens 17 A Data Cube Week 8 Fact tables can be viewed as an N-dimensional data cube (3-dimensional in our example) The entries in the cube are the values for Sales_Amts MIE253-Consens 18 Dimension Tables The dimensions of the fact table are further described with dimension tables Fact table: Sales (Market_id, Product_Id, Time_Id, Sales_Amt) Dimension Tables: Market (Market_Id, City, Province, Region) Product (Product_Id, Name, Category, Price) Time (Time_Id, Week, Month, Quarter) Week 8 MIE253-Consens 19 Aggregation Many OLAP queries involve aggregation of the data in the fact table For example, to find the total sales (over time) of each product in each market, we might use SELECT S.Market_Id, S.Product_Id, SUM(S.Sales_Amt) FROM Sales S GROUP BY S.Market_Id, S.Product_Id Week 8 The aggregation is over the entire time dimension and thus produces a two-dimensional view of the data MIE253-Consens 20 Aggregation over Time The 2-D output of the previous query Market_Id SUM(Sales_Amt) Product_Id M1 Week 8 P1 P2 P3 P4 P5 MIE253-Consens M2 3003 1503 6003 2402 4503 3 7503 7000 … … M3 … … … … … 21 Drilling Down and Rolling Up Some dimension tables form an aggregation hierarchy Market_Id City Province Region ALL Executing a series of queries that moves down a hierarchy (e.g., from aggregation over regions to that over provinces) is called drilling down Requires the use of the fact table or information more specific than the requested aggregation (e.g., cities) Executing a series of queries that moves up the hierarchy (e.g., from provinces to regions) is called rolling up Week 8 Note: In a rollup, coarser aggregations can be computed using prior queries for finer aggregations MIE253-Consens 22 Drilling Down Drilling down on market: from Region to Province Sales (Market_Id, Product_Id, Time_Id, Sales_Amt) Market (Market_Id, City, Province, Region) 1. SELECT S.Product_Id, M.Region, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Region 2. Week 8 SELECT FROM WHERE GROUP BY S.Product_Id, M.Province, SUM (S.Sales_Amt) Sales S, Market M M.Market_Id = S.Market_Id S.Product_Id, M.Province, MIE253-Consens 23 Rolling Up Rolling up on market, from Province to Region If we have already created a table Province_Sales SELECT S.Product_Id, M.Province, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Province then we can roll up from there to SELECT FROM WHERE T.Product_Id, M.Region, SUM (T.Sales_Amt) Province_Sales T, Market M M.Province = T.Province GROUP BY T.Product_Id, M.Region Week 8 MIE253-Consens 24