Download Chapter 5 Relational Algebra

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

SQL wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Relational algebra wikipedia , lookup

Transcript
Chapter 5 Relational Algebra
Contents



















Terminology
Operators - Write
Operators - Retrieval
Relational SELECT
Relational PROJECT
SELECT and PROJECT
Set Operations - semantics
SET Operations - requirements
UNION Example
INTERSECTION Example
DIFFERENCE Example
CARTESIAN PRODUCT
CARTESIAN PRODUCT example
JOIN Operator
JOIN Example
Natural Join
OUTER JOINs
OUTER JOIN example 1
OUTER JOIN example 2
In order to implement a DBMS, there must exist a set of rules which state how the
database system will behave. For instance, somewhere in the DBMS must be a set of
statements which indicate than when someone inserts data into a row of a relation, it has
the effect which the user expects. One way to specify this is to use words to write an
`essay' as to how the DBMS will operate, but words tend to be imprecise and open to
interpretation. Instead, relational databases are more usually defined using Relational
Algebra.
Relational Algebra is :



the formal description of how a relational database operates
an interface to the data stored in the database itself
the mathematics which underpin SQL operations
Operators in relational algebra are not necessarily the same as SQL operators, even if
they have the same name. For example, the SELECT statement exists in SQL, and also
exists in relational algebra. These two uses of SELECT are not the same. The DBMS
must take whatever SQL statements the user types in and translate them into relational
algebra operations before applying them to the database.
Terminology





Relation - a set of tuples.
Tuple - a collection of attributes which describe some real world entity.
Attribute - a real world role played by a named domain.
Domain - a set of atomic values.
Set - a mathematical definition for a collection of objects which contains no
duplicates.
Operators - Write



INSERT - provides a list of attribute values for a new tuple in a relation. This
operator is the same as SQL.
DELETE - provides a condition on the attributes of a relation to determine which
tuple(s) to remove from the relation. This operator is the same as SQL.
MODIFY - changes the values of one or more attributes in one or more tuples of a
relation, as identified by a condition operating on the attributes of the relation.
This is equivalent to SQL UPDATE.
Operators - Retrieval
There are two groups of operations:


Mathematical set theory based relations:
UNION, INTERSECTION, DIFFERENCE, and CARTESIAN PRODUCT.
Special database operations:
SELECT (not the same as SQL SELECT), PROJECT, and JOIN.
Relational SELECT
SELECT is used to obtain a subset of the tuples of a relation that satisfy a select
condition.
For example, find all employees born after 1st Jan 1950:
SELECTdob
'01/JAN/1950'(employee)
Relational PROJECT
The PROJECT operation is used to select a subset of the attributes of a relation by
specifying the names of the required attributes.
For example, to get a list of all employees surnames and employee numbers:
PROJECTsurname,empno(employee)
SELECT and PROJECT
SELECT and PROJECT can be combined together. For example, to get a list of
employee numbers for employees in department number 1:
Figure : Mapping select and project
Set Operations - semantics
Consider two relations R and S.



UNION of R and S
the union of two relations is a relation that includes all the tuples that are either in
R or in S or in both R and S. Duplicate tuples are eliminated.
INTERSECTION of R and S
the intersection of R and S is a relation that includes all tuples that are both in R
and S.
DIFFERENCE of R and S
the difference of R and S is the relation that contains all the tuples that are in R
but that are not in S.
SET Operations - requirements
For set operations to function correctly the relations R and S must be union compatible.
Two relations are union compatible if


they have the same number of attributes
the domain of each attribute in column order is the same in both R and S.
UNION Example
Figure : UNION
INTERSECTION Example
Figure : Intersection
DIFFERENCE Example
Figure : DIFFERENCE
CARTESIAN PRODUCT
The Cartesian Product is also an operator which works on two sets. It is sometimes called
the CROSS PRODUCT or CROSS JOIN.
It combines the tuples of one relation with all the tuples of the other relation.
CARTESIAN PRODUCT example
Figure : CARTESIAN PRODUCT
JOIN Operator
JOIN is used to combine related tuples from two relations:


In its simplest form the JOIN operator is just the cross product of the two
relations.
As the join becomes more complex, tuples are removed within the cross product
to make the result of the join more meaningful.

JOIN allows you to evaluate a join condition between the attributes of the
relations on which the join is undertaken.
The notation used is
R JOINjoin
condition
S
JOIN Example
Figure : JOIN
Natural Join
Invariably the JOIN involves an equality test, and thus is often described as an equi-join.
Such joins result in two attributes in the resulting relation having exactly the same value.
A `natural join' will remove the duplicate attribute(s).


In most systems a natural join will require that the attributes have the same name
to identify the attribute(s) to be used in the join. This may require a renaming
mechanism.
If you do use natural joins make sure that the relations do not have two attributes
with the same name by accident.
OUTER JOINs
Notice that much of the data is lost when applying a join to two relations. In some cases
this lost data might hold useful information. An outer join retains the information that
would have been lost from the tables, replacing missing data with nulls.
There are three forms of the outer join, depending on which data is to be kept.

LEFT OUTER JOIN - keep data from the left-hand table


RIGHT OUTER JOIN - keep data from the right-hand table
FULL OUTER JOIN - keep data from both tables
OUTER JOIN example 1
Figure : OUTER JOIN (left/right)
OUTER JOIN example 2
Figure : OUTER JOIN (full)
Relational Algebra - Example
Contents







Symbolic Notation
Usage
Rename Operator
Derivable Operators
Equivalence
Equivalences
Comparing RA and SQL

Comparing RA and SQL
Consider the following SQL to find which departments have had employees on the
`Further Accounting' course.
SELECT DISTINCT dname
FROM department, course, empcourse, employee
WHERE cname = `Further Accounting'
AND course.courseno = empcourse.courseno
AND empcourse.empno = employee.empno
AND employee.depno = department.depno;
The equivalent relational algebra is
PROJECTdname (department JOINdepno = depno (
PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)
))
))
))
Symbolic Notation
From the example, one can see that for complicated cases a large amount of the answer is
formed from operator names, such as PROJECT and JOIN. It is therefore commonplace
to use symbolic notation to represent the operators.








SELECT ->σ (sigma)
PROJECT -> π(pi)
PRODUCT -> ×(times)
JOIN -> |×| (bow-tie)
UNION -> ∪ (cup)
INTERSECTION -> ∩(cap)
DIFFERENCE -> - (minus)
RENAME ->ρ (rho)
Usage
The symbolic operators are used as with the verbal ones. So, to find all employees in
department 1:
SELECTdepno = 1(employee)
becomes σdepno = 1(employee)
Conditions can be combined together using ^ (AND) and v (OR). For example, all
employees in department 1 called `Smith':
SELECTdepno = 1 ^ surname = `Smith'(employee)
becomes σdepno = 1 ^ surname = `Smith'(employee)
The use of the symbolic notation can lend itself to brevity. Even better, when the JOIN is
a natural join, the JOIN condition may be omitted from |x|. The earlier example resulted
in:
PROJECTdname (department JOINdepno = depno (
PROJECTdepno (employee JOINempno = empno (
PROJECTempno (empcourse JOINcourseno = courseno (
PROJECTcourseno (SELECTcname = `Further Accounting' course)))))))
becomes
πdname (department |×| (
πdepno (employee |×| (
πempno (empcourse |×| (
πcourseno (σcname = `Further
Accounting'
course) ))))))
Rename Operator
The rename operator returns an existing relation under a new name. ρA(B) is the relation
B with its name changed to A. For example, find the employees in the same Department
as employee 3.
ρemp2.surname,emp2.forenames (
σemployee.empno = 3 ^ employee.depno = emp2.depno (
employee × (ρemp2employee)
)
)
Derivable Operators


Fundamental operators:σ, π, ×, ∪, -, ρ
Derivable operators: |×|,∩
A ∩ B ⇔ A - (A - B)
Equivalence
A|×|cB ⇔ πa1,a2,...aN(σc(A × B))


where c is the join condition (eg A.a1 = B.a1),
and a1,a2,...aN are all the attributes of A and B without repetition.
c is called the join-condition, and is usually the comparison of primary and foreign key.
Where there are N tables, there are usually N-1 join-conditions. In the case of a natural
join, the conditions can be missed out, but otherwise missing out conditions results in a
cartesian product (a common mistake to make).
Equivalences
The same relational algebraic expression can be written in many different ways. The
order in which tuples appear in relations is never significant.







A ×B ⇔B × A
A ∩ B ⇔B ∩ A
A ∪B ⇔B ∪ A
(A - B) is not the same as (B - A)
σc1 (σc2(A)) ⇔σc2 (σc1(A)) ⇔σc1 ^ c2(A)
πa1(A) ⇔πa1(πa1,etc(A))
where etc represents any other attributes of A.
many other equivalences exist.
While equivalent expressions always give the same result, some may be much easier to
evaluate that others.
When any query is submitted to the DBMS, its query optimiser tries to find the most
efficient equivalent expression before evaluating it.
Comparing RA and SQL











Relational algebra:
is closed (the result of every expression is a relation)
has a rigorous foundation
has simple semantics
is used for reasoning, query optimisation, etc.
SQL:
is a superset of relational algebra
has convenient formatting features, etc.
provides aggregate functions
has complicated semantics
is an end-user language.
Comparing RA and SQL
Any relational language as powerful as relational algebra is called relationally complete.
A relationally complete language can perform all basic, meaningful operations on
relations.
Since SQL is a superset of relational algebra, it is also relationally complete.