Download PPT - MIT Database Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
6.830/6.814 Lecture 5
Database Internals Continued
September 21, 2016
Database Internals Outline
Front End
Admission Control
Connection Management
Query System
(sql)
This time
Parser
(parse tree)
Rewriter
Last time
(parse tree)
Planner & Optimizer
(query plan)
Executor
Storage System
Access Methods
Lock Manager
Buffer Manager
Log Manager
Flattening Example
Flatten this query (departments where number of
machines is more than number of employees):
SELECT dept.name
FROM dept
WHERE dept.num-of-machines ≥
(SELECT COUNT(emp.*) FROM emp
WHERE dept.name=emp.dept_name)
What happens if there is a department with no employees?
Answer
“Query rewrite optimization rules in IBM DB2
universal database”
SELECT dept.name
“Rule Engine for Query Transformation in
FROM dept,emp
Starburst and IBM DB2 C/S DBMS “
WHERE dept.name=emp.dept_name
GROUP BY dept.name
HAVING dept.num-of-machines < COUNT(emp.*)
SELECT dept.name FROM dept
LEFT OUTER JOIN emp ON
(dept.name=emp.dept_name )
GROUP BY dept.name
HAVING dept.num-of-machines < COUNT(emp.*)
Plan Formulation
emp (eno, ename, sal, dno)
dept (dno, dname, bldg)
kids (kno, eno, kname, bday)
Π
ename,count
𝛔
count > 7
𝛂
agg:count(*), group by ename
SELECT ename, count(*)
FROM emp, dept, kids
WHERE emp.dno=dept.dno
AND kids.eno=emp.eno
AND emp.sal > 50000
AND dept.name = 'eecs'
GROUP BY ename
HAVING count(*) > 7
⨝
eno=eno
kids
⨝
dno=dno
𝛔
name=‘eecs’
dept
𝛔
sal>50k
emp
Plan questions
Π
ename,count
𝛔
count > 7
𝛂
Logical planning:
operator ordering
(exponential search
space)
agg:count(*), group by ename
⨝
Order?
eno=eno
kids
⨝
Physical planning:
operator implementation
& access methods
(indexes vs heap files)
dno=dno
𝛔
name=‘eecs’
dept
𝛔
Implementation?
sal>50k
emp
Storage model &
access methods?
Query Plans Example
create table dept (dno int primary key, bldg int);
insert into dept (dno, bldg) select x.id, (random() * 10)::int FROM
generate_series(0,100000) AS x(id);
create table emp (eno int primary key, dno int references dept(dno), sal int,
ename varchar);
insert into emp (eno, dno, sal, ename) select x.id, (random() * 100000)::int,
(random() * 55000)::int, 'emp' || x.id from generate_series(0,10000000) AS
x(id);
create table kids (kno int primary key, eno int references emp(eno), kname
varchar);
insert into kids (kno,eno,kname) select x.id, (random() * 1000000)::int, 'kid' ||
x.id from generate_series(0,3000000) AS x(id);
Iterator Interface
void open ();
Tuple next ();
void close ();
Scan
Scan(tableName)
this.tableName = tableName
open():
f = fopen(this.tableName)
next():
tuple = readTuple(f)
return tuple
Filter
Filter(pred,child):
this.pred = pred
this.child = child
open():
this.child.open()
next():
do:
tuple = child.next()
if (tuple == null)
return null
if (pred(tuple))
return tuple
Nested Loops Join
Join(outer,inner,pred)
for t1 in outer:
for t2 in inner:
if p(t1,t2)
emit join(t1,t2)
Problem:
If inner is a sub-query, e.g.
C ⨝ D, have to continually
recompute it, or store it to disk
(materialize it)
If inner is just a base relation
(e.g., C or D), then no need for
additional materialization
Related documents