Download L01Relational_Data_Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Database wikipedia , lookup

Join (SQL) wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Relational algebra wikipedia , lookup

Relational model wikipedia , lookup

Transcript
CM036 Advanced Databases
Lecture 1:
Relational Data Models
Content
1 Relational Models, Languages and their
Use in databases
2 The Relational Algebra
3 The Relational Calculus
CM036 Advanced Databases
Lecture 1: Relational Models
2
1.1 Relational and other data models



Relational models (relational algebra,
relational calculus, predicate calculus)
Hierarchical (object-relational)
Object-oriented
CM036 Advanced Databases
Lecture 1: Relational Models
3
1.2 Relational Languages and their Use

Data manipulation language (DML)



Data definition language (DDL)



Use: Populates, updates, and queries relational databases
Example: relational algebra, SQL DML
Use: Specifies the data structures and defines the relational schema
Example: relational calculus, SQL DDL
Data control language (DCL)


Use: Specifies operation permissions, data access discipline and
administrative privileges, and defines the database users
Example: SQL DCL, LDAP
CM036 Advanced Databases
Lecture 1: Relational Models
4
2 The Relational Algebra



Proposed by Codd in 1970 as a formal data model. Describes the
operations to manipulate relations.
Used to specify retrieval requests (queries). Query result is also in
the form of a relation.
Relational operators transform either a simple relation or a pair of
relations into another relation
Relational Operations:


RESTRICT (s) and PROJECT () operations.
Set operations: UNION (), INTERSECTION (), DIFFERENCE (—),
CARTESIAN PRODUCT ().

JOIN operations (⋈).

Other relational operations: DIVISION, OUTER JOIN,
AGGREGATE FUNCTIONS.
CM036 Advanced Databases
Lecture 1: Relational Models
5
CM036 Advanced Databases
Lecture 1: Relational Models
6
2.1 RESTRICT s and PROJECT P

RESTRICT operation (called also SELECT- denoted bys ):


Selects the tuples (rows) from a relation R that satisfy a certain
selection condition c on the attributes of R : s c(R)
Resulting relation includes each tuple in R whose attribute values
satisfy the condition c, i.e. it has the same attributes as R
Examples:
s DNO=4(EMPLOYEE)
s SALARY>30000(EMPLOYEE)
s(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
CM036 Advanced Databases
Lecture 1: Relational Models
7
PROJECT operation (denoted by


 ):
Keeps only certain attributes (columns) from a relation R specified in
an attribute list L:  L(R)
Resulting relation has only those attributes of R specified in L
Example:  FNAME,LNAME,SALARY(EMPLOYEE)
 The PROJECT operation eliminates duplicate tuples in the resulting
relation so that it remains a true set (no duplicate elements)
Example:  SEX,SALARY(EMPLOYEE)
 If several male employees have salary 30000, only a single tuple
<M, 30000> isLecture
kept
in the resulting relation.
CM036 Advanced Databases
1: Relational Models
8
CM036 Advanced Databases
Lecture 1: Relational Models
9
Sequence of operations
Several operations can be combined to form a relational algebra
expression (query)
Example: Retrieve the names and salaries of employees who work

in department 4
 FNAME,LNAME,SALARY (s DNO=4(EMPLOYEE) )

Alternatively, we specify explicit intermediate relations for each
step:
DEPT4_EMPS  s DNO=4(EMPLOYEE)
R   FNAME,LNAME,SALARY(DEPT4_EMPS)

Attributes can optionally be renamed in the resulting left-handside relation (this may be required for some operations that will
be presented later):
DEPT4_EMPS  s DNO=4(EMPLOYEE)
R(FNAME,LNAME,SALARY) 
 FNAME,LNAME,SALARY(DEPT4_EMPS)
CM036 Advanced Databases
Lecture 1: Relational Models
10
CM036 Advanced Databases
Lecture 1: Relational Models
11
2.2 Set Operations



Binary operations from set theory: UNION: R1  R2,INTERSECTION: R1
 R2, DIFFERENCE: R1 — R2,
For , , —, the operand relations R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn)
must have the same number of attributes, and the domains of corresponding
attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This
condition is called union compatibility.
The resulting relation for , , or — has the same attribute names as the
first operand relation R1 (by convention).
CM036 Advanced Databases
Lecture 1: Relational Models
12
CM036 Advanced Databases
Lecture 1: Relational Models
13

CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, ..., Bn)  R1(A1, A2, ..., Am)  R2 (B1, ..., Bn)

A tuple t exists in R for each combination of tuples t1 from R1 and t2
from R2 such that:
t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2


The resulting relation R has n1 + n2 columns
If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.
CM036 Advanced Databases
Lecture 1: Relational Models
14

Obviously, CARTESIAN PRODUCT is useless if alone, since it
generates all possible combinations. It can combine related tuples
from two relations in a more informative way if followed by the
appropriate RESTRICT operation
Example: Retrieve a list of the names of dependents for each
female employee
FEMALE_EMPS  s SEX=‘F’(EMPLOYEE)
EMPNAMES   FNAME,LNAME,SSN(FEMALE_EMPS)
EMP_DEPENDENTS  EMPNAMES  DEPENDENT
ACTUAL_DEPENDENTS  s SSN=ESSN(EMP_DEPENDENTS)
RESULT   FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS)
CM036 Advanced Databases
Lecture 1: Relational Models
15
CM036 Advanced Databases
Lecture 1: Relational Models
16
2.3 JOIN Operations

THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a
RESTRICT. The condition c is called a join condition.
R(A1, A2, ..., Am, B1, B2, ..., Bn) 
R1(A1, A2, ..., Am) ⋈ c R2 (B1, B2, ..., Bn)

EQUIJOIN: The join condition c includes equality comparisons involving
attributes from R1 and R2. That is, c is of the form:
(Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN operation:
 Ai, ..., Ah are called the join attributes of R1
 Bj, ..., Bk are called the join attributes of R2
Example: Retrieve each department's name and its manager's name:
T  DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE
RESULT   DNAME,FNAME,LNAME(T)
CM036 Advanced Databases
Lecture 1: Relational Models
17
DEPT_MGR   DNAME,…,MGRSSN,…LNAME,…,SSN… (DEPARTMENT ⋈ MGRSSN=SSN EMPLOYEE)
CM036 Advanced Databases
Lecture 1: Relational Models
18
NATURAL JOIN

In an EQUIJOIN R  R1 ⋈c R2, the join attribute of R2 appear
redundantly in the result relation R. In a NATURAL JOIN, the join
attributes of R2 are eliminated from R. The equality is implied and
there is no need to specify it. The form of the operator is
R  R1 ⋈(join attributes of R1),(join attributes of R2) R2
Example: Retrieve each employee's name and the name of the
department he/she works for:
T  EMPLOYEE ⋈(DNO),(DNUMBER) DEPARTMENT
RESULT   FNAME,LNAME,DNAME(T)

If the join attributes have the same names in both relations, they
need not be specified and we can write R  R1 * R2.
Example: Retrieve each employee's name and the name of
his/her supervisor:
SUPERVISOR(SUPERSSN,SFN,SLN) SN,FNAME,LNAME(EMPLOYEE)
T  EMPLOYEE * SUPERVISOR
RESULT   FNAME,LNAME,SFN,SLN(T)
CM036 Advanced Databases
Lecture 1: Relational Models
19
CM036 Advanced Databases
Lecture 1: Relational Models
20
Note: In the original definition of NATURAL JOIN, the join attributes
were required to have the same names in both relations.
 There can be a more than one set of join attributes with a different
meaning between the same two relations. For example:
JOIN ATTRIBUTES
EMPLOYEE.SSN=
DEPARTMENT.MGRSSN
EMPLOYEE.DNO=
DEPARTMENT.DNUMBER
RELATIONSHIP
EMPLOYEE manages
the DEPARTMENT
EMPLOYEE works for
the DEPARTMENT
Example: Retrieve each employee’s name and the name of the
department he/she works for:
T  EMPLOYEE ⋈ DNO=DNUMBERDEPARTMENT
RESULT   FNAME,LNAME,DNAME(T)
CM036 Advanced Databases
Lecture 1: Relational Models
21

A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES
EMPLOYEE(1).SUPERSSN=
EMPLOYEE(2).SSN


RELATIONSHIP
EMPLOYEE(2) supervises
EMPLOYEE(1)
One can think of this as joining two distinct copies of the
relation, although only one relation actually exists
In this case, renaming can be useful
Example: Retrieve each employee’s name and the name of his/her
supervisor:
SUPERVISOR(SSSN,SFN,SLN)   SSN,FNAME,LNAME(EMPLOYEE)
T  EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISOR
RESULT   FNAME,LNAME,SFN,SLN(T)
CM036 Advanced Databases
Lecture 1: Relational Models
22
2.4 Complete Set of Operations



All the operations discussed so far can be described as a sequence
of only the operations RESTRICT, PROJECT, UNION, SET
DIFFERENCE, and CARTESIAN PRODUCT.
Hence, the set {s , , ,—, } is called a complete set of relational
algebra operations. Any query language equivalent to these
operations is called relationally complete.
For database applications, additional operations are needed that
were not part of the original relational algebra. These include:
 1. Aggregate functions and grouping.
 2. OUTER JOIN and OUTER UNION.
CM036 Advanced Databases
Lecture 1: Relational Models
23
2.4 Additional Relational Operations

AGGREGATE FUNCTIONS
Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often
applied to sets of values or sets of tuples in database applications
<grouping attributes> <function list> (R)

The grouping attributes are optional
Example: Retrieve the average salary of all employees
(no grouping needed):
R(AVGSAL)   AVERAGE SALARY (EMPLOYEE)
Example: For each department, retrieve the department number, the
number of employees, and the average salary:
R(DNO,NUMEMPS,AVGSAL) 
DNO  COUNT SSN, AVERAGE SALARY (EMPLOYEE)

DNO is called the grouping attribute in the above example
CM036 Advanced Databases
Lecture 1: Relational Models
24
CM036 Advanced Databases
Lecture 1: Relational Models
25

OUTER JOIN


In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or
R2 that do not have matching tuples in the other relation do not
appear in the result. Some queries require all tuples in R1 (or R2 or
both) to appear in the result
When no matching tuples are found, nulls are placed for the missing
attributes
 LEFT OUTER JOIN: R1
R2 lets every tuple in R1 appear in the
result
 RIGHT OUTER JOIN: R1
R2 lets every tuple in R2 appear in
the result
 FULL OUTER JOIN: R1
R2 lets every tuple in both R1 and R2
appear in the result
CM036 Advanced Databases
Lecture 1: Relational Models
26
TEMP EMPLOYEE
SSN=MGRSSNDEPARTMENT
RESULT   FNAME,MINIT,LNAME,DNAME(TEMP)
CM036 Advanced Databases
Lecture 1: Relational Models
27
3 Relational Calculus



Declarative language based on predicate logic - checks what is
true in the database rather then looking to get it
The main difference in comparison with the relational algebra is
the introduction of variables which range over attributes or
tuples
Two different variations of the relational calculus:
 domain calculus - specification of formal properties for
different data types, used for describing data
 tuple calculus - querying and checking formal properties of
stored relational data
CM036 Advanced Databases
Lecture 1: Relational Models
28
3.1 Predicate Calculus




Vocabulary of constant names, functional attributes and predicate
properties, used for describing the information
Phrases for specification of constant, functionally dependant or
predicated properties build using proper vocabulary terms
Statements about the world, formulated as meaningful phrases,
connected through logical connectives in logical sentences
 conjunction ()
 disjunction ()
 negation ()
 implication ()
 equivalence ()
Possible quantification of the sentences using existential () or
universal () quantifiers, ranging over variables
Example: All units registered in the database have unit leaders
among the lecturers
(x).(Unit (x)  (y).(Lecturer(y)  Leader(y,x)))
CM036 Advanced Databases
Lecture 1: Relational Models
29
3.2 Relational DB as a Predicate Calculus model



Semantic interpretation of the calculus is given in a set, so either all
type domains of the relational schema (domain calculus), or the set of
all relations in the database (tuple calculus) can serve a model for it
All meaningful sentences in the model are true or false, so the
relation tuples can be interpreted as truthful facts describing the world
The constraints describe logical regularities among the attribute
values, so they can be expressed as logical sentences
Example: All records of unit leaders have primary keys
 Relation names are predicates with a place for each attribute
Leader(Lno:INTEGER,Lname:CHAR,Uname:CHAR) - Lno, Lname and Uname
are attributes of Leader

Data values in the relation tuples are constants from the domains
Leader(2,‘Johnson’,’CS234’) - 2,Johnson and CS234 are constants
forming one Leader tuple

Constraints can be stated using quantified variables over domains
(x)(y,z) .(Leader(x,y,z)) - x stands for the primary key of Leader
CM036 Advanced Databases
Lecture 1: Relational Models
30
Notes:
1. In predicate calculus sentences contain quantified variables only
2. The sub-expressions following the quantified variables form the
scope of quantification; it is usually closed in round parentheses
Example: The table for unit leaders has foreign keys to both the
lecturers and units tables

Relation names are predicates with place for each attribute
Unit(Uno:INTEGER, Uname:CHAR) - Uno and Uname are attributes of
relation Unit
Lecturer(Lno:INTEGER, Lname:CHAR) - Lno and Lname are attributes of
relation Lecturer
Leader(LUno:INTEGER,Lno:INTEGER,Uno:INTEGER) - LUno, Lno and Uno
are attributes of relation Leader

All foreign key constraints can be stated logically using quantified
sentences, in which variables range over the respective values
( LUno,Uno,Lno).(Leader(LUno,Uno,Lno) 
( Lname).(Lecturer(Lno,Lname)) 
( Uname).(Unit(Uno,Uname))
)
CM036 Advanced Databases
Lecture 1: Relational Models
31
3.3 Relational Calculus as database
query language



Queries are logical expressions, which contain non-quantified (free)
variables for tuples
The free variables are placeholders for the information, we are
looking for from the database relations
The logical expressions, used in the query, are its conditions which
need to be met during answering the query
Example: Who are the employees with salary greater than 40 000?
{e.FNAME,e.LNAME | Employee(e)  e.SALARY > 40 000}

An answer is a replacement of the free variables in the query with
tuple attributes from the relations used to formulate the query
conditions; they should make the statement of the query true in
database (pattern matching)
Answer: James Borg and Jennifer Wallace
Free variables Matching values
e.FNAME
e.LNAME
CM036 Advanced Databases
James
Borg
Lecture 1: Relational Models
Jennifer
Wallace
32
Notes:
1. The question variables are always listed in front of the
expression, separated by | from the question condition;
2. Question condition can contain free variables only from the list
in the beginning; all other variables should be quantified

Querying related tables requires check of the corresponding keys
for equality
Example: Retrieve the name and address of all employees
who work for the ‘Research’ department
{e.FNAME,e.LNAME,e.ADDRESS |
Employee(e) 
( d).(Department(d) 
d.DNAME = ‘Research’ 
d.DNUMBER = e.DNO
)
}
CM036 Advanced Databases
Lecture 1: Relational Models
33
3.4 Equvalence between relational algebra
and relational calculus



Relational algebra and relational calculus are equivalent with
respect to their ability to formulate queries against relational
databases (Codd 1972)
Relational algebra concentrates on the procedural (“how-to”)
aspects and because of this it is used as an intermediate
language for optimization of database queries
Relational calculus is more appropriate to specify declaratively
the model properties (“what is true”), without worrying about the
way it is achieved and as such it can be used as a specification
or querying language
The relational language Query-By-Example (QBE), which is
developed by IBM during 70ties and is implemented in Paradox
and Access desktop database systems is based on the
relational calculus
CM036 Advanced Databases
Lecture 1: Relational Models
34
Summary
The variety of relational languages currently used to communicate
with databases has different usage in database systems lifecycle






Formal languages have unambiguous syntax and clear semantics,
which makes them good for specifications and conceptualisation
Formal languages hide the details of implementation and can’t give
very deep insight into the real database processes
Formal languages formalize adequately only part of the database
operations, which have semantics inherited from the relational
model; they do not cover DCL at al.
SQL is not a programming language, but language for
communication with DBMS
SQL as a mixture of DDL, DML and DCL is the ultimate choice for
practical relational databases development and administration
SQL syntax is ambiguous when it comes to the order of operations,
which requires deep knowledge about the way it is interpreted
CM036 Advanced Databases
Lecture 1: Relational Models
35