Download On-Line Analytical Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
On-Line Analytical Processing
Week 8
Week 8
MIE253-Consens
1
Schedule
Week
Date
Lecture Topic
1
Jan 9
Introduction to Data Management
This week’s
reading:
2
Jan 16
The Relational Model
3
Jan. 23
Constraints and SQL DDL
Chapter 5
4
Jan. 30
SQL DML, DB Applications, JDBC
5
Feb 6
JDBC, DDL (Views, Access Control)
6
Feb 13
Relational Algebra, Advanced SQL
-
Feb 20
[Reading Week]
7
Feb 27
Review and Midterm (Mar 1)
8
Mar 5
OLAP
9
Mar 12
ER Conceptual Modelling
10
Mar 19
Normalization
11
Mar 26
XML and Data Integration
12
Apr 2
Transactions and the Internet, Query Processing
13
Apr 9
Final Review
Week 8
MIE253-Consens
Chapter 17
2
Aggregates

Functions that operate on sets:




COUNT, SUM, AVG, MAX, MIN
Produce numbers (not tables)
Not part of relational algebra (but not
hard to add)
Note: COUNT counts NULLs like any other
value; other aggregates ignore NULLs
SELECT COUNT(*)
FROM Professor P
Week 8
MIE253-Consens
SELECT MAX (Salary)
FROM Employee E
3
Duplicate elimination in Aggregates
Count the number of courses taught in S2010
SELECT COUNT (T.CrsCode)
FROM Teaching T
WHERE T.Semester = ‘S2010’
But if multiple sections of same course
are taught, use:
SELECT COUNT (DISTINCT T.CrsCode)
FROM Teaching T
WHERE T.Semester = ‘S2010’
Week 8
MIE253-Consens
4
Grouping

But how do we compute the number of
courses taught in S2010 per professor?

A separate query for each professor
SELECT COUNT(T.CrsCode)
FROM Teaching T
WHERE T.Semester = ‘S2010’ AND T.ProfId = 123456789

SQL defines a special grouping operator
T.ProfId, COUNT(T.CrsCode)
Teaching T
T.Semester = ‘S2010’
GROUP BY T.ProfId
SELECT
FROM
WHERE
Week 8
MIE253-Consens
5
GROUP BY - Example
Transcript
(StudId, Grade)
(StudId, AVG(Grade), COUNT(*))
1234
1234
1234
1234
1234 78 4
Groups
for each student
the average grade
the number of courses
SELECT T.StudId, AVG(T.Grade), COUNT (*)
FROM Transcript T
GROUP BY T.StudId
Week 8
MIE253-Consens
6
GROUP BY Evaluation
Week 8
MIE253-Consens
7
Aggregate View - Example
CREATE VIEW StAvg (StudId, AvgGrade) AS
SELECT T.StudId, AVG (T.Grade)
FROM Transcript T
GROUP BY T.StudId
SELECT S.Name, C.AvgGrade
FROM StAvg C, Student S
WHERE C.StudId = S.StudId
AND C.AvgGrade > 80
Week 8
MIE253-Consens
8
HAVING Clause


Eliminates unwanted groups (analogous to WHERE
clause, but works on groups instead of individual
tuples)
HAVING condition is constructed from
attributes of GROUP BY list and aggregates on
attributes not in that list
SELECT T.StudId,
AVG(T.Grade) AS AvgGrade,
COUNT (*) AS NumCrs
FROM Transcript T
WHERE T.CrsCode LIKE ‘MIE%’
GROUP BY T.StudId
HAVING AVG (T.Grade) > 80
Week 8
MIE253-Consens
9
Evaluation of GROUP BY HAVING
Week 8
MIE253-Consens
10
ORDER BY Clause

Causes rows to be output in a specified order
SELECT T.StudId, COUNT (*) AS NumCrs,
AVG(T.Grade) AS AvgGrade
FROM Transcript T
WHERE T.CrsCode LIKE ‘MIE%’
GROUP BY T.StudId
HAVING AVG (T.Grade) > 3.5
ORDER BY DESC AvgGrade, ASC StudId
Descending
Week 8
Ascending
MIE253-Consens
11
Summary: Aggregation in SQL

Aggregate functions


Grouping


HAVING
Displaying ordered results

Week 8
GROUP BY
Filtering groups


COUNT, SUM, AVG, MAX, MIN
ORDER BY
MIE253-Consens
12
OLTP Compared With OLAP

On Line Transaction Processing – OLTP

Maintains a database that is an accurate model of some
real-world enterprise. Supports day-to-day operations.
Characteristics:




On Line Analytic Processing – OLAP

Uses information in database to guide strategic
decisions. Characteristics:




Week 8
Short simple transactions
Relatively frequent updates
Transactions access only a small fraction of the database
Complex queries
Infrequent updates
Transactions access a large fraction of the database
Data need not be up-to-date
MIE253-Consens
13
The Internet Grocer

OLTP-style transaction:


Mary Smith, from Etobicoke, just bought a box of
tomatoes; charge his account; deliver the tomatoes from
our Mississauga warehouse; decrease our inventory of
tomatoes from that warehouse
OLAP-style report:

Week 8
What are the average monthly sales of tomatoes in the
GTA for the years 2007 and 2008?
MIE253-Consens
14
Evolution of OLAP Applications

Traditional OLAP (Decision Support, MIS)


Uses data the enterprise gathers in its usual
activities, perhaps in its OLTP system
Queries are ad hoc, designed and carried out by a
variety of users (e.g., unit managers)


Newer Applications (e.g., e-commerce)


Enterprise actively gathers data it wants, perhaps
purchasing it
Queries are sophisticated, designed by professionals,
and used in more sophisticated ways

Week 8
Write a query to report the average monthly sales of
tomatoes in the GTA for the years 2002 and 2003.
Prepare a profile of the grocery purchases of Mary
Smith for the years 2007 and 2008 (personalized
marketing, recommend a specific cart to Mary)
MIE253-Consens
15
Data Warehouses

OLAP and data mining databases are
frequently stored on special servers called
data warehouses (and data marts):


Week 8
Can accommodate the huge amount of data
generated by OLTP systems
Allow OLAP queries and data mining to be run
off-line so as not to impact the performance of
OLTP
MIE253-Consens
16
Fact Tables


Many OLAP applications are based on a fact
table
For example, a supermarket application might be
based on a table
Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)

The table can be viewed as multidimensional


Week 8
Market_Id, Product_Id, Time_Id are the dimensions that
represent specific supermarkets, products, and time
intervals
Sales_Amt is a function of the other three
MIE253-Consens
17
A Data Cube

Week 8
Fact tables can be viewed as an N-dimensional data cube
(3-dimensional in our example)

The entries in the cube are the values for Sales_Amts
MIE253-Consens
18
Dimension Tables


The dimensions of the fact table are
further described with dimension tables
Fact table:
Sales (Market_id, Product_Id, Time_Id, Sales_Amt)

Dimension Tables:
Market (Market_Id, City, Province, Region)
Product (Product_Id, Name, Category, Price)
Time (Time_Id, Week, Month, Quarter)
Week 8
MIE253-Consens
19
Aggregation


Many OLAP queries involve aggregation of the
data in the fact table
For example, to find the total sales (over time)
of each product in each market, we might use
SELECT
S.Market_Id, S.Product_Id, SUM(S.Sales_Amt)
FROM
Sales S
GROUP BY S.Market_Id, S.Product_Id

Week 8
The aggregation is over the entire time
dimension and thus produces a two-dimensional
view of the data
MIE253-Consens
20
Aggregation over Time

The 2-D output of the previous query
Market_Id
SUM(Sales_Amt)
Product_Id
M1
Week 8
P1
P2
P3
P4
P5
MIE253-Consens
M2
3003
1503
6003
2402
4503
3
7503
7000
…
…
M3
…
…
…
…
…
21
Drilling Down and Rolling Up

Some dimension tables form an aggregation
hierarchy
Market_Id  City  Province  Region  ALL

Executing a series of queries that moves down a
hierarchy (e.g., from aggregation over regions to
that over provinces) is called drilling down


Requires the use of the fact table or information more
specific than the requested aggregation (e.g., cities)
Executing a series of queries that moves up the
hierarchy (e.g., from provinces to regions) is called
rolling up

Week 8
Note: In a rollup, coarser aggregations can be computed
using prior queries for finer aggregations
MIE253-Consens
22
Drilling Down
Drilling down on market: from Region to Province
Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)
Market (Market_Id, City, Province, Region)
1.
SELECT
S.Product_Id, M.Region, SUM (S.Sales_Amt)
FROM
Sales S, Market M
WHERE
M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.Region
2.
Week 8
SELECT
FROM
WHERE
GROUP BY
S.Product_Id, M.Province, SUM (S.Sales_Amt)
Sales S, Market M
M.Market_Id = S.Market_Id
S.Product_Id, M.Province,
MIE253-Consens
23
Rolling Up
Rolling up on market, from Province to Region
If we have already created a table Province_Sales
SELECT
S.Product_Id, M.Province, SUM (S.Sales_Amt)
FROM
Sales S, Market M
WHERE
M.Market_Id = S.Market_Id
GROUP BY S.Product_Id, M.Province
then we can roll up from there to
SELECT
FROM
WHERE
T.Product_Id, M.Region, SUM (T.Sales_Amt)
Province_Sales T, Market M
M.Province = T.Province
GROUP BY T.Product_Id, M.Region
Week 8
MIE253-Consens
24