Download Relation

Document related concepts

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Database Systems: A Review
Relational Data Model 1
Dr. Muhammad Shafique
ICS 541 - 01 (072)
Relational Data Model 1
1
Outline
•
Relational data model
1. Data Management
•
•
•
•
Informal definitions
Formal definitions
Integrity constraints
Update operations
2. Data Manipulation
•
•
Relational algebra
Relational calculus
•
•
Tuple calculus
Domain calculus
3. Implementation
•
ICS 541 - 01 (072)
SQL
Relational Data Model 1
2
References
• Textbook Chapter 5, 6, 8
The Relational Data Model and Relational
Database Constraints
• Textbook Chapter 6
The Relational Algebra and Relational Calculus
• Textbook Chapter 8: SQL 99
• E.F. Codd, "A Relational Model for Large Shared
Data Banks," Communications of the ACM, June
1970
ICS 541 - 01 (072)
Relational Data Model 1
3
1. Data management
ICS 541 - 01 (072)
Relational Data Model 1
4
Relational Model Concepts
• The relational Model of Data is based on the concept of
a Relation
• The strength of the relational approach to data management
comes from the formal foundation provided by the theory of
relations
• Note: There are several important differences between
the formal model and the practical model
• A “relation” is a fundamental mathematical
notion expressing a relationship between
elements of sets.
• A binary relation from a set A to a set B is a subset R
 A × B.
ICS 541 - 01 (072)
Relational Data Model 1
5
Relational Model Concepts
•
Definitions of relation
1. A relation L is defined by specifying two mathematical objects as its
constituent parts:
1. The first part is called the figure of L, notated as figure(L) or F(L).
2. The second part is called the ground of L, notated as ground(L) or G(L).
2. A k-ary relation L over the nonempty sets X1, …, Xk is a (1+k)-tuple L
= (F(L), X1, …, Xk) where F(L) is a subset of the Cartesian product X1 ×
… × Xk. If all of the Xj for j = 1 to k are the same set X, then L is more
simply called a k-ary relation over X. The set F(L) is called the figure
of L and, providing that the sequence of sets X1, …, Xk is fixed
throughout a given discussion or determinate in context, one may regard
the relation L as being determined by its figure F(L).
ICS 541 - 01 (072)
Relational Data Model 1
6
Relational Model Concepts
• A Relation is a mathematical concept based on the
ideas of sets
• The model was first proposed by Dr. E.F. Codd of
IBM Research in 1970 in the following paper:
• "A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970
• The above paper caused a major revolution in the
field of database management and earned Dr. Codd
the coveted ACM Turing Award
ICS 541 - 01 (072)
Relational Data Model 1
7
Informal Definitions
• Informally, a relation looks like a table of values.
• A relation typically contains a set of rows.
• The data elements in each row represent certain facts that
correspond to a real-world entity or relationship
• In the formal model, rows are called tuples
• Each column has a column header that gives an indication of the
meaning of the data items in that column
• In the formal model, the column header is called an attribute name (or
just attribute)
ICS 541 - 01 (072)
Relational Data Model 1
8
Example of a Relation
ICS 541 - 01 (072)
Relational Data Model 1
9
Informal Definitions
• Key of a Relation:
• Each row has a value of a data item (or set of items) that
uniquely identifies that row in the table
• Called the key
• Sometimes row-ids or sequential numbers are assigned as
keys to identify the rows in a table
• Called artificial key or surrogate key
ICS 541 - 01 (072)
Relational Data Model 1
10
Formal Definitions - Schema
• The Schema (or description) of a Relation:
• Denoted by R(A1, A2, .....An)
• R is the name of the relation
• The attributes of the relation are A1, A2, ..., An
• Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
• CUSTOMER is the relation name
• Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
• Each attribute has a domain or a set of valid values.
• For example, the domain of Cust-id is 6 digit numbers.
ICS 541 - 01 (072)
Relational Data Model 1
11
Formal Definitions - Tuple
• A tuple is an ordered set of values (enclosed in angled
brackets ‘< … >’)
• Each value is derived from an appropriate domain.
• A row in the CUSTOMER relation is a 4-tuple and would
consist of four values, for example:
• <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404)
894-2000">
• This is called a 4-tuple as it has 4 values
• A tuple (row) in the CUSTOMER relation.
• A relation is a set of such tuples (rows)
ICS 541 - 01 (072)
Relational Data Model 1
12
Formal Definitions - Domain
• A domain has a logical definition:
• Example: “Cell_phone_numbers” are the set of 10 digit phone numbers
• A domain also has a data-type or a format defined for it.
• The Cell_phone_numbers may have a format: ddd-ddd-dddd where each d is a
decimal digit.
• Dates have various formats such as year, month, date formatted as yyyy-mmdd, or as dd mm,yyyy etc.
• The attribute name designates the role played by a domain in a relation:
• Used to interpret the meaning of the data elements corresponding to that
attribute
• Example: The domain Date may be used to define two attributes named
“Invoice-date” and “Payment-date” with different meanings
ICS 541 - 01 (072)
Relational Data Model 1
13
Formal Definitions - State
• The relation state is a subset of the Cartesian product
of the domains of its attributes
• each domain contains the set of all possible values the
attribute can take.
• Example: attribute Cust-name is defined over the
domain of character strings of maximum length 25
• dom(Cust-name) is varchar(25)
• The role these strings play in the CUSTOMER
relation is that of the name of a customer.
ICS 541 - 01 (072)
Relational Data Model 1
14
Formal Definitions - Summary
• Formally,
• Given R(A1, A2, .........., An)
• r(R)  dom (A1) X dom (A2) X ....X dom(An)
•
•
•
•
R (A1, A2, …, An) is the schema of the relation
R is the name of the relation
A1, A2, …, An are the attributes of the relation
r(R): a specific state (or "value" or “population”) of relation R
– this is a set of tuples (rows)
• r(R) = {t1, t2, …, tn} where each ti is an n-tuple
• ti = <v1, v2, …, vn> where each vj element-of dom(Aj)
ICS 541 - 01 (072)
Relational Data Model 1
15
Formal Definitions - Example
• Let R(A1, A2) be a relation schema:
• Let dom(A1) = {0,1}
• Let dom(A2) = {a,b,c}
• Then: dom(A1) X dom(A2) is all possible combinations:
{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }
• The relation state r(R)  dom(A1) X dom(A2)
• For example: r(R) could be {<0,a> , <0,b> , <1,c> }
• This is one possible state (or “population” or “extension”) r of the
relation R, defined over A1 and A2.
• It has three 2-tuples: <0,a> , <0,b> , <1,c>
ICS 541 - 01 (072)
Relational Data Model 1
16
Definition Summary
Informal Terms
Formal Terms
Table
Relation
Column Header
Attribute
All possible Column Values
Domain
Row
Tuple
Table Definition
Schema of a Relation
Populated Table
State of the Relation
ICS 541 - 01 (072)
Relational Data Model 1
17
Example – A relation STUDENT
ICS 541 - 01 (072)
Relational Data Model 1
18
Characteristics Of Relations
• Ordering of tuples in a relation r(R):
• The tuples are not considered to be ordered
• Ordering of attributes in a relation schema R (and of
values within each tuple):
• We will consider the attributes in R(A1, A2, ..., An) and
the values in t=<v1, v2, ..., vn> to be ordered .
• However, a more general alternative definition of relation does not
require this ordering.
ICS 541 - 01 (072)
Relational Data Model 1
19
Same state as previous Figure
(but with different order of tuples)
ICS 541 - 01 (072)
Relational Data Model 1
20
Characteristics Of Relations
• Values in a tuple:
• All values are considered atomic (indivisible).
• Each value in a tuple must be from the domain of the
attribute for that column
• If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of
R(A1, A2, …, An)
• Then each vi must be a value from dom(Ai)
• A special null value is used to represent values that are
unknown or inapplicable to certain tuples.
ICS 541 - 01 (072)
Relational Data Model 1
21
Characteristics Of Relations
• Notation:
• We refer to component values of a tuple t by:
• t[Ai] or t.Ai
• This is the value vi of attribute Ai for tuple t
• Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t
containing the values of attributes Au, Av, ..., Aw,
respectively in t
ICS 541 - 01 (072)
Relational Data Model 1
22
Relational Integrity Constraints
• Constraints are the conditions that must hold on all valid
relation states.
• There are three main types of constraints in the relational
model:
• Key constraints
• Entity integrity constraints
• Referential integrity constraints
• Another implicit constraint is the domain constraint
• Every value in a tuple must be from the domain of its attribute (or it
could be null, if allowed for that attribute)
ICS 541 - 01 (072)
Relational Data Model 1
23
Key Constraints
• Superkey of R:
• It is a set of attributes SK of R with the following condition:
• No two tuples in any valid relation state r(R) will have the same value for
SK
• That is, for any distinct tuples t1 and t2 in r(R), t1[SK]  t2[SK]
• This condition must hold in any valid state r(R)
• Key of R:
• A "minimal" superkey
• That is, a key is a superkey K such that removal of any attribute from K
results in a set of attributes that is not a superkey (does not possess the
superkey uniqueness property)
ICS 541 - 01 (072)
Relational Data Model 1
24
Key Constraints (continued)
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• CAR has two keys:
• Key1 = {State, Reg#}
• Key2 = {SerialNo}
• Both are also superkeys of CAR
• {SerialNo, Make} is a superkey but not a key.
• In general:
• Any key is a superkey (but not vice versa)
• Any set of attributes that includes a key is a superkey
• A minimal superkey is also a key
ICS 541 - 01 (072)
Relational Data Model 1
25
Key Constraints (continued)
• If a relation has several candidate keys, one is chosen
arbitrarily to be the primary key.
• The primary key attributes are underlined.
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• We chose SerialNo as the primary key
• The primary key value is used to uniquely identify each tuple
in a relation
• Provides the tuple identity
• Also used to reference the tuple from another tuple
• General rule: Choose as primary key the smallest of the candidate keys
(in terms of size)
• Not always applicable – choice is sometimes subjective
ICS 541 - 01 (072)
Relational Data Model 1
26
CAR table with two candidate keys –
LicenseNumber chosen as Primary Key
ICS 541 - 01 (072)
Relational Data Model 1
27
Relational Database Schema
• Relational Database Schema:
• A set S of relation schemas that belong to the same
database.
• S is the name of the whole database schema
• S = {R1, R2, ..., Rn}
• R1, R2, …, Rn are the names of the individual relation
schemas within the database S
• Following slide shows a COMPANY database
schema with 6 relation schemas
ICS 541 - 01 (072)
Relational Data Model 1
28
COMPANY Database Schema
ICS 541 - 01 (072)
Relational Data Model 1
29
Entity Integrity
• Entity Integrity:
• The primary key attributes PK of each relation schema R in S
cannot have null values in any tuple of r(R).
• This is because primary key values are used to identify the individual
tuples.
• t[PK]  null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of these attributes
• Note: Other attributes of R may be constrained to disallow null
values, even though they are not members of the primary key.
ICS 541 - 01 (072)
Relational Data Model 1
30
Referential Integrity
• A constraint involving two relations
• The previous constraints involve a single relation.
• Used to specify a relationship among tuples in two
relations:
• The referencing relation and the referenced relation.
ICS 541 - 01 (072)
Relational Data Model 1
31
Referential Integrity
• Tuples in the referencing relation R1 have attributes
FK (called foreign key attributes) that reference the
primary key attributes PK of the referenced
relation R2.
• A tuple t1 in R1 is said to reference a tuple t2 in R2 if
t1[FK] = t2[PK].
• A referential integrity constraint can be displayed in
a relational database schema as a directed arc from
R1.FK to R2.
ICS 541 - 01 (072)
Relational Data Model 1
32
Referential Integrity (or foreign key)
Constraint
•
Statement of the constraint
•
The value in the foreign key column (or columns) FK of
the referencing relation R1 can be either:
1. A value of an existing primary key value of a corresponding
primary key PK in the referenced relation R2, or
2. A null.
•
In case (2), the FK in R1 should not be a part of its
own primary key.
ICS 541 - 01 (072)
Relational Data Model 1
33
Displaying a relational database schema and
its constraints
• Each relation schema can be displayed as a row of
attribute names
• The name of the relation is written above the attribute
names
• The primary key attribute (or attributes) will be
underlined
• A foreign key (referential integrity) constraints is
displayed as a directed arc (arrow) from the foreign key
attributes to the referenced table
• Can also point the the primary key of the referenced relation for
clarity
• Next slide shows the COMPANY relational schema
diagram
ICS 541 - 01 (072)
Relational Data Model 1
34
Referential Integrity Constraints for
COMPANY database
ICS 541 - 01 (072)
Relational Data Model 1
35
Other Types of Constraints
• Semantic Integrity Constraints:
• based on application semantics and cannot be expressed by
the model per se
• Example: “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week”
• A constraint specification language may have to be
used to express these
• SQL-99 allows triggers and ASSERTIONS to
express for some of these
ICS 541 - 01 (072)
Relational Data Model 1
36
Populated database state
• Each relation will have many tuples in its current relation
state
• The relational database state is a union of all the individual
relation states
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
• INSERT a new tuple in a relation
• DELETE an existing tuple from a relation
• MODIFY an attribute of an existing tuple
• Next slide shows an example state for the COMPANY
database
ICS 541 - 01 (072)
Relational Data Model 1
37
Populated database state for COMPANY
ICS 541 - 01 (072)
Relational Data Model 1
38
Update Operations on Relations
•
•
•
•
INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Integrity constraints should not be violated by the
update operations.
• Several update operations may have to be grouped
together.
• Updates may propagate to cause other updates
automatically. This may be necessary to maintain
integrity constraints.
ICS 541 - 01 (072)
Relational Data Model 1
39
Update Operations on Relations
• In case of integrity violation, several actions can be
taken:
• Cancel the operation that causes the violation (RESTRICT
or REJECT option)
• Perform the operation but inform the user of the violation
• Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
• Execute a user-specified error-correction routine
ICS 541 - 01 (072)
Relational Data Model 1
40
Possible violations for INSERT operation
• INSERT may violate any of the constraints:
• Domain constraint:
• if one of the attribute values provided for the new tuple is not of the
specified attribute domain
• Key constraint:
• if the value of a key attribute in the new tuple already exists in another
tuple in the relation
• Referential integrity:
• if a foreign key value in the new tuple references a primary key value
that does not exist in the referenced relation
• Entity integrity:
• if the primary key value is null in the new tuple
ICS 541 - 01 (072)
Relational Data Model 1
41
Possible violations for DELETE operation
• DELETE may violate only referential integrity:
• If the primary key value of the tuple being deleted is referenced from
other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
(see Chapter 8 for more details)
• RESTRICT option: reject the deletion
• CASCADE option: propagate the new primary key value into the foreign keys
of the referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to NULL
• One of the above options must be specified during database design for
each foreign key constraint
ICS 541 - 01 (072)
Relational Data Model 1
42
Possible violations for UPDATE operation
• UPDATE may violate domain constraint and NOT NULL
constraint on an attribute being modified
• Any of the other constraints may also be violated, depending
on the attribute being updated:
• Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
• Updating a foreign key (FK):
• May violate referential integrity
• Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints
ICS 541 - 01 (072)
Relational Data Model 1
43
Summary Part I
• Presented Relational Model Concepts
• Definitions
• Characteristics of relations
• Discussed Relational Model Constraints and Relational
Database Schemas
•
•
•
•
Domain constraints’
Key constraints
Entity integrity
Referential integrity
• Described the Relational Update Operations and Dealing with
Constraint Violations
ICS 541 - 01 (072)
Relational Data Model 1
44
2. Data manipulation
ICS 541 - 01 (072)
Relational Data Model 1
45
Part II Outline
• Relational Algebra
•
•
•
•
•
Unary Relational Operations
Relational Algebra Operations From Set Theory
Binary Relational Operations
Additional Relational Operations
Examples of Queries in Relational Algebra
• Relational Calculus
• Tuple Relational Calculus
• Domain Relational Calculus
ICS 541 - 01 (072)
Relational Data Model 1
46
Relational Algebra Overview
• Relational algebra is the basic set of operations for
the relational model
• These operations enable a user to specify basic
retrieval requests (or queries)
• The result of an operation is a new relation, which
may have been formed from one or more input
relations
• This property makes the algebra “closed” (all objects in
relational algebra are relations)
ICS 541 - 01 (072)
Relational Data Model 1
47
Relational Algebra Overview (continued)
• The algebra operations thus produce new relations
• These can be further manipulated using operations of the
same algebra
• A sequence of relational algebra operations forms a
relational algebra expression
• The result of a relational algebra expression is also a relation
that represents the result of a database query (or retrieval
request)
ICS 541 - 01 (072)
Relational Data Model 1
48
Brief History of Origins of Algebra
• Muhammad ibn Musa al-Khwarizmi (800-847 CE)
wrote a book titled al-jabr about arithmetic of
variables
• Book was translated into Latin.
• Its title (al-jabr) gave Algebra its name.
• Al-Khwarizmi called variables “shay”
• “Shay” is Arabic for “thing”.
• Spanish transliterated “shay” as “xay” (“x” was “sh” in
Spain).
• In time this word was abbreviated as x.
• Where does the word Algorithm come from?
• Algorithm originates from “al-Khwarizmi"
• Reference: PBS (http://www.pbs.org/empires/islam/innoalgebra.html)
ICS 541 - 01 (072)
Relational Data Model 1
49
Relational Algebra Overview
• Relational Algebra consists of several groups of operations
• Unary Relational Operations
• SELECT (symbol:  (sigma))
• PROJECT (symbol:  (pi))
• RENAME (symbol:  (rho))
• Relational Algebra Operations From Set Theory
• UNION (  ), INTERSECTION (  ), DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
• Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of information:
for example, SUM, COUNT, AVG, MIN, MAX)
ICS 541 - 01 (072)
Relational Data Model 1
50
Unary Relational Operations: SELECT
• The SELECT operation (denoted by  (sigma)) is used to select a subset of
the tuples from a relation based on a selection condition.
• The selection condition acts as a filter
• Keeps only those tuples that satisfy the qualifying condition
• Tuples satisfying the condition are selected whereas the other tuples are
discarded (filtered out)
• Examples:
• Select the EMPLOYEE tuples whose department number is 4:
 DNO = 4 (EMPLOYEE)
• Select the employee tuples whose salary is greater than $30,000:
 SALARY > 30,000 (EMPLOYEE)
ICS 541 - 01 (072)
Relational Data Model 1
51
Unary Relational Operations: SELECT
• In general, the select operation is denoted by
 <selection condition>(R) where
• the symbol  (sigma) is used to denote the select operator
• the selection condition is a Boolean (conditional) expression
specified on the attributes of relation R
• tuples that make the condition true are selected
• appear in the result of the operation
• tuples that make the condition false are filtered out
• discarded from the result of the operation
ICS 541 - 01 (072)
Relational Data Model 1
52
Unary Relational Operations: SELECT (contd.)
• SELECT Operation Properties
• The SELECT operation  <selection condition>(R) produces a relation S that has
the same schema (same attributes) as R
• SELECT  is commutative:
  <condition1>( < condition2> (R)) =  <condition2> ( < condition1> (R))
• Because of commutativity property, a cascade (sequence) of SELECT
operations may be applied in any order:
 <cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
• A cascade of SELECT operations may be replaced by a single selection
with a conjunction of all the conditions:
 <cond1>(< cond2> (<cond3>(R)) =  <cond1> AND < cond2> AND < cond3>(R)))
• The number of tuples in the result of a SELECT is less than (or equal
to) the number of tuples in the input relation R
ICS 541 - 01 (072)
Relational Data Model 1
53
Unary Relational Operations: PROJECT
• PROJECT Operation is denoted by  (pi)
• This operation keeps certain columns (attributes)
from a relation and discards the other columns.
• PROJECT creates a vertical partitioning
• The list of specified columns (attributes) is kept in each tuple
• The other attributes in each tuple are discarded
• Example: To list each employee’s first and last name
and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)
ICS 541 - 01 (072)
Relational Data Model 1
54
Unary Relational Operations: PROJECT (cont.)
• The general form of the project operation is:
<attribute list>(R)
  (pi) is the symbol used to represent the project operation
• <attribute list> is the desired list of attributes from relation
R.
• The project operation removes any duplicate tuples
• This is because the result of the project operation must be a
set of tuples
• Mathematical sets do not allow duplicate elements.
ICS 541 - 01 (072)
Relational Data Model 1
55
Unary Relational Operations: PROJECT (contd.)
• PROJECT Operation Properties
• The number of tuples in the result of projection <list>(R) is
always less or equal to the number of tuples in R
• If the list of attributes includes a key of R, then the number of
tuples in the result of PROJECT is equal to the number of tuples in
R
• PROJECT is not commutative
  <list1> ( <list2> (R) ) =  <list1> (R) as long as <list2> contains the
attributes in <list1>
ICS 541 - 01 (072)
Relational Data Model 1
56
Relational Algebra Expressions
• We may want to apply several relational algebra
operations one after the other
• Either we can write the operations as a single relational
algebra expression by nesting the operations, or
• We can apply one operation at a time and create
intermediate result relations.
• In the latter case, we must give names to the
relations that hold the intermediate results.
ICS 541 - 01 (072)
Relational Data Model 1
57
Single expression versus sequence of relational
operations (Example)
• To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply
a select and a project operation
• We can write a single relational algebra expression as follows:
 FNAME, LNAME, SALARY( DNO=5(EMPLOYEE))
• OR We can explicitly show the sequence of operations, giving
a name to each intermediate relation:
• DEP5_EMPS   DNO=5(EMPLOYEE)
• RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)
ICS 541 - 01 (072)
Relational Data Model 1
58
Unary Relational Operations: RENAME
• The RENAME operator is denoted by  (rho)
• In some cases, we may want to rename the attributes
of a relation or the relation name or both
• Useful when a query requires multiple operations
• Necessary in some cases (see JOIN operation later)
ICS 541 - 01 (072)
Relational Data Model 1
59
Unary Relational Operations: RENAME (contd.)
• The general RENAME operation  can be expressed
by any of the following forms:
• S (B1, B2, …, Bn )(R) changes both:
• the relation name to S, and
• the column (attribute) names to B1, B1, …..Bn
• S(R) changes:
• the relation name only to S
• (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1, …..Bn
ICS 541 - 01 (072)
Relational Data Model 1
60
Unary Relational Operations: RENAME (contd.)
• For convenience, we also use a shorthand for
renaming attributes in an intermediate relation:
• If we write:
• RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)
• RESULT will have the same attribute names as DEP5_EMPS
(same attributes as EMPLOYEE)
• If we write:
• RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO) 
 FNAME, LNAME, SALARY (DEP5_EMPS)
• The 10 attributes of DEP5_EMPS are renamed to F, M, L, S, B,
A, SX, SAL, SU, DNO, respectively
ICS 541 - 01 (072)
Relational Data Model 1
61
Example of applying multiple operations and
RENAME
ICS 541 - 01 (072)
Relational Data Model 1
62
Relational Algebra Operations from
Set Theory: UNION
• UNION Operation
• Binary operation, denoted by 
• The result of R  S, is a relation that includes all tuples that
are either in R or in S or in both R and S
• Duplicate tuples are eliminated
• The two operand relations R and S must be “type compatible”
(or UNION compatible)
• R and S must have same number of attributes
• Each pair of corresponding attributes must be type compatible
(have same or compatible domains)
ICS 541 - 01 (072)
Relational Data Model 1
63
Relational Algebra Operations from
Set Theory: UNION
• Example:
• To retrieve the social security numbers of all employees who either work in
department 5 (RESULT1 below) or directly supervise an employee who
works in department 5 (RESULT2 below)
• We can use the UNION operation as follows:
DEP5_EMPS  DNO=5 (EMPLOYEE)
RESULT1   SSN(DEP5_EMPS)
RESULT2(SSN)  SUPERSSN(DEP5_EMPS)
RESULT  RESULT1  RESULT2
• The union operation produces the tuples that are in either RESULT1 or
RESULT2 or both
ICS 541 - 01 (072)
Relational Data Model 1
64
Example of the result of a UNION operation
• UNION Example
ICS 541 - 01 (072)
Relational Data Model 1
65
Relational Algebra Operations from
Set Theory
• Type Compatibility of operands is required for the binary set
operation UNION , (also for INTERSECTION , and SET
DIFFERENCE –, see next slides)
• R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type
compatible if:
• they have the same number of attributes, and
• the domains of corresponding attributes are type compatible (i.e.
dom(Ai)=dom(Bi) for i=1, 2, ..., n).
• The resulting relation for R1R2 (also for R1R2, or R1–R2,
see next slides) has the same attribute names as the first
operand relation R1 (by convention)
ICS 541 - 01 (072)
Relational Data Model 1
66
Relational Algebra Operations from Set Theory:
INTERSECTION
• INTERSECTION is denoted by 
• The result of the operation R  S, is a relation
that includes all tuples that are in both R and S
• The attribute names in the result will be the same
as the attribute names in R
• The two operand relations R and S must be
“type compatible”
ICS 541 - 01 (072)
Relational Data Model 1
67
Relational Algebra Operations from Set Theory:
SET DIFFERENCE (cont.)
• SET DIFFERENCE (also called MINUS or
EXCEPT) is denoted by –
• The result of R – S, is a relation that includes all
tuples that are in R but not in S
• The attribute names in the result will be the same
as the attribute names in R
• The two operand relations R and S must be
“type compatible”
ICS 541 - 01 (072)
Relational Data Model 1
68
Example to illustrate the result of UNION,
INTERSECT, and DIFFERENCE
ICS 541 - 01 (072)
Relational Data Model 1
69
Some properties of UNION, INTERSECT, and
DIFFERENCE
• Notice that both union and intersection are commutative
operations; that is
• R  S = S  R, and R  S = S  R
• Both union and intersection can be treated as n-ary operations
applicable to any number of relations as both are associative
operations; that is
• R  (S  T) = (R  S)  T
• (R  S)  T = R  (S  T)
• The minus operation is not commutative; that is, in general
• R–S≠S–R
ICS 541 - 01 (072)
Relational Data Model 1
70
Relational Algebra Operations from Set Theory:
CARTESIAN PRODUCT
• CARTESIAN (or CROSS) PRODUCT Operation
• This operation is used to combine tuples from two relations in a
combinatorial fashion.
• Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
• Result is a relation Q with degree n + m attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
• The resulting relation state has one tuple for each combination of
tuples—one from R and one from S.
• Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples,
then R x S will have nR * nS tuples.
• The two operands do NOT have to be "type compatible”
ICS 541 - 01 (072)
Relational Data Model 1
71
Relational Algebra Operations from Set Theory:
CARTESIAN PRODUCT (cont.)
• Generally, CROSS PRODUCT is not a meaningful
operation
• Can become meaningful when followed by other operations
• Example (not meaningful):
• FEMALE_EMPS   SEX=’F’(EMPLOYEE)
• EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
• EMP_DEPENDENTS  EMPNAMES x DEPENDENT
• EMP_DEPENDENTS will contain every combination of
EMPNAMES and DEPENDENT
• whether or not they are actually related
ICS 541 - 01 (072)
Relational Data Model 1
72
Relational Algebra Operations from Set Theory:
CARTESIAN PRODUCT (cont.)
• To keep only combinations where the DEPENDENT
is related to the EMPLOYEE, we add a SELECT
operation as follows
• Example (meaningful):
•
•
•
•
•
FEMALE_EMPS   SEX=’F’(EMPLOYEE)
EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
EMP_DEPENDENTS  EMPNAMES x DEPENDENT
ACTUAL_DEPS   SSN=ESSN(EMP_DEPENDENTS)
RESULT   FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)
• RESULT will now contain the name of female employees and
their dependents
ICS 541 - 01 (072)
Relational Data Model 1
73
Example of applying CARTESIAN PRODUCT
ICS 541 - 01 (072)
Relational Data Model 1
74
Binary Relational Operations: JOIN
• JOIN Operation (denoted by
)
• The sequence of CARTESIAN PRODECT followed by SELECT is
used quite commonly to identify and select related tuples from two
relations
• A special operation, called JOIN combines this sequence into a single
operation
• This operation is very important for any relational database with more
than a single relation, because it allows us combine related tuples from
various relations
• The general form of a join operation on two relations R(A1, A2, . . .,
An) and S(B1, B2, . . ., Bm) is:
R <join condition>S
• where R and S can be any relations that result from general relational
algebra expressions.
ICS 541 - 01 (072)
Relational Data Model 1
75
Binary Relational Operations: JOIN (cont.)
• Example: Suppose that we want to retrieve the name of the
manager of each department.
• To get the manager’s name, we need to combine each DEPARTMENT
tuple with the EMPLOYEE tuple whose SSN value matches the
MGRSSN value in the department tuple.
• We do this by using the join
operation.
• DEPT_MGR  DEPARTMENT
MGRSSN=SSN EMPLOYEE
• MGRSSN=SSN is the join condition
• Combines each department record with the employee who manages the
department
• The join condition can also be specified as DEPARTMENT.MGRSSN=
EMPLOYEE.SSN
ICS 541 - 01 (072)
Relational Data Model 1
76
Example of applying the JOIN operation
ICS 541 - 01 (072)
Relational Data Model 1
77
Some properties of JOIN
• Consider the following JOIN operation:
• R(A1, A2, . . ., An)
S(B1, B2, . . ., Bm)
R.Ai=S.Bj
• Result is a relation Q with degree n + m attributes:
• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
• The resulting relation state has one tuple for each combination of
tuples—r from R and s from S, but only if they satisfy the join condition
r[Ai]=s[Bj]
• Hence, if R has nR tuples, and S has nS tuples, then the join result will
generally have less than nR * nS tuples.
• Only related tuples (based on the join condition) will appear in the
result
ICS 541 - 01 (072)
Relational Data Model 1
78
Some properties of JOIN
• The general case of JOIN operation is called a Thetajoin:
R
S
theta
• The join condition is called theta
• Theta can be any general boolean expression on the
attributes of R and S; for example:
• R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
• Most join conditions involve one or more equality
conditions “AND”ed together; for example:
• R.Ai=S.Bj AND R.Ak=S.Bl AND R.Ap=S.Bq
ICS 541 - 01 (072)
Relational Data Model 1
79
Binary Relational Operations: EQUIJOIN
• EQUIJOIN Operation
• The most common use of join involves join
conditions with equality comparisons only
• Such a join, where the only comparison operator used
is =, is called an EQUIJOIN.
• In the result of an EQUIJOIN we always have one or more
pairs of attributes (whose names need not be identical) that
have identical values in every tuple.
• The JOIN seen in the previous example was an EQUIJOIN.
ICS 541 - 01 (072)
Relational Data Model 1
80
Binary Relational Operations:
NATURAL JOIN Operation
• NATURAL JOIN Operation
• Another variation of JOIN called NATURAL JOIN — denoted by * —
was created to get rid of the second (superfluous) attribute in an
EQUIJOIN condition.
• because one of each pair of attributes with identical values is superfluous
• The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have the same
name in both relations
• If this is not the case, a renaming operation is applied first.
ICS 541 - 01 (072)
Relational Data Model 1
81
Binary Relational Operations NATURAL JOIN
(contd.)
• Example: To apply a natural join on the DNUMBER attributes of
DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write:
• DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS
• Only attribute with the same name is DNUMBER
• An implicit join condition is created based on this attribute:
DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER
• Another example: Q  R(A,B,C,D) * S(C,D,E)
• The implicit join condition includes each pair of attributes with the same name,
“AND”ed together:
• R.C=S.C AND R.D.S.D
• Result keeps only one attribute of each such pair:
• Q(A,B,C,D,E)
ICS 541 - 01 (072)
Relational Data Model 1
82
Example of NATURAL JOIN operation
ICS 541 - 01 (072)
Relational Data Model 1
83
Complete Set of Relational Operations
• The set of operations including SELECT ,
PROJECT  , UNION , DIFFERENCE - ,
RENAME , and CARTESIAN PRODUCT X is
called a complete set because any other relational
algebra expression can be expressed by a combination
of these five operations.
• For example:
• R  S = (R  S ) – ((R - S)  (S - R))
• R
<join condition>S =  <join condition> (R X S)
ICS 541 - 01 (072)
Relational Data Model 1
84
Binary Relational Operations: DIVISION
• DIVISION Operation
• The division operation is applied to two relations
• R(Z)  S(X), where X subset Z. Let Y = Z - X (and hence Z = X 
Y); that is, let Y be the set of attributes of R that are not attributes of S.
• The result of DIVISION is a relation T(Y) that includes a tuple t if
tuples tR appear in R with tR [Y] = t, and with
• tR [X] = ts for every tuple ts in S.
• For a tuple t to appear in the result T of the DIVISION, the values in t
must appear in R in combination with every tuple in S.
ICS 541 - 01 (072)
Relational Data Model 1
85
Example of DIVISION
ICS 541 - 01 (072)
Relational Data Model 1
86
Recap of Relational Algebra Operations
ICS 541 - 01 (072)
Relational Data Model 1
87
Additional Relational Operations: Aggregate
Functions and Grouping
• A type of request that cannot be expressed in the basic
relational algebra is to specify mathematical aggregate
functions on collections of values from the database.
• Examples of such functions include retrieving the average or
total salary of all employees or the total number of employee
tuples.
• These functions are used in simple statistical queries that summarize
information from the database tuples.
• Common functions applied to collections of numeric values
include
• SUM, AVERAGE, MAXIMUM, and MINIMUM.
• The COUNT function is used for counting tuples or values.
ICS 541 - 01 (072)
Relational Data Model 1
88
Aggregate Function Operation
• Use of the Aggregate Functional operation ℱ
• ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from the
EMPLOYEE relation
• ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from the
EMPLOYEE relation
• ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from the
EMPLOYEE relation
• ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count (number)
of employees and their average salary
• Note: count just counts the number of rows, without removing duplicates
ICS 541 - 01 (072)
Relational Data Model 1
89
Using Grouping with Aggregation
• The previous examples all summarized one or more attributes
for a set of tuples
• Maximum Salary or Count (number of) Ssn
• Grouping can be combined with Aggregate Functions
• Example: For each department, retrieve the DNO, COUNT
SSN, and AVERAGE SALARY
• A variation of aggregate operation ℱ allows this:
• Grouping attribute placed to left of symbol
• Aggregate functions to right of symbol
• DNO ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE)
• Above operation groups employees by DNO (department
number) and computes the count of employees and average
salary per department
ICS 541 - 01 (072)
Relational Data Model 1
90
Additional Relational Operations (cont.)
• Recursive Closure Operations
• Another type of operation that, in general, cannot be
specified in the basic original relational algebra is
recursive closure.
• This operation is applied to a recursive relationship.
• An example of a recursive operation is to retrieve all
SUPERVISEES of an EMPLOYEE e at all levels —
that is, all EMPLOYEE e’ directly supervised by e; all
employees e’’ directly supervised by each employee e’;
all employees e’’’ directly supervised by each
employee e’’; and so on.
ICS 541 - 01 (072)
Relational Data Model 1
91
Additional Relational Operations (cont.)
• Although it is possible to retrieve employees at each
level and then take their union, we cannot, in general,
specify a query such as “retrieve the supervisees of
‘James Borg’ at all levels” without utilizing a looping
mechanism.
• The SQL3 standard includes syntax for recursive closure.
ICS 541 - 01 (072)
Relational Data Model 1
92
Additional Relational Operations (cont.)
•
ICS 541 - 01 (072)
Relational Data Model 1
93
Additional Relational Operations (cont.)
• The OUTER JOIN Operation
• In NATURAL JOIN and EQUIJOIN, tuples without a matching (or
related) tuple are eliminated from the join result
• Tuples with null in the join attributes are also eliminated
• This amounts to loss of information.
• A set of operations, called OUTER joins, can be used when we want to
keep all the tuples in R, or all those in S, or all those in both relations in
the result of the join, regardless of whether or not they have matching
tuples in the other relation.
ICS 541 - 01 (072)
Relational Data Model 1
94
Additional Relational Operations (cont.)
• The left outer join operation keeps every tuple in the
first or left relation R in R
S; if no matching tuple
is found in S, then the attributes of S in the join result
are filled or “padded” with null values.
• A similar operation, right outer join, keeps every
tuple in the second or right relation S in the result of
R
S.
• A third operation, full outer join, denoted by
keeps all tuples in both the left and the right relations
when no matching tuples are found, padding them
with null values as needed.
ICS 541 - 01 (072)
Relational Data Model 1
95
Additional Relational Operations (cont.)
ICS 541 - 01 (072)
Relational Data Model 1
96
Additional Relational Operations (cont.)
• OUTER UNION Operations
• The outer union operation was developed to take the union
of tuples from two relations if the relations are not type
compatible.
• This operation will take the union of tuples in two relations
R(X, Y) and S(X, Z) that are partially compatible,
meaning that only some of their attributes, say X, are type
compatible.
• The attributes that are type compatible are represented only
once in the result, and those attributes that are not type
compatible from either relation are also kept in the result
relation T(X, Y, Z).
ICS 541 - 01 (072)
Relational Data Model 1
97
Additional Relational Operations (cont.)
• Example: An outer union can be applied to two relations
whose schemas are STUDENT(Name, SSN, Department,
Advisor) and INSTRUCTOR(Name, SSN, Department, Rank).
• Tuples from the two relations are matched based on having the same
combination of values of the shared attributes— Name, SSN, Department.
• If a student is also an instructor, both Advisor and Rank will have a value;
otherwise, one of these two attributes will be null.
• The result relation STUDENT_OR_INSTRUCTOR will have the
following attributes:
STUDENT_OR_INSTRUCTOR (Name, SSN, Department,
Advisor, Rank)
ICS 541 - 01 (072)
Relational Data Model 1
98
Examples of Queries in Relational Algebra
• Q1: Retrieve the name and address of all employees who work for the
‘Research’ department.
RESEARCH_DEPT   DNAME=’Research’ (DEPARTMENT)
RESEARCH_EMPS  (RESEARCH_DEPT
DNUMBER= DNOEMPLOYEE
EMPLOYEE)
RESULT   FNAME, LNAME, ADDRESS (RESEARCH_EMPS)
• Q6: Retrieve the names of employees who have no dependents.
ALL_EMPS   SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN)   ESSN(DEPENDENT)
EMPS_WITHOUT_DEPS  (ALL_EMPS - EMPS_WITH_DEPS)
RESULT   LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)
ICS 541 - 01 (072)
Relational Data Model 1
99
Relational Calculus
• A relational calculus expression creates a new
relation, which is specified in terms of variables that
range over rows of the stored database relations (in
tuple calculus) or over columns of the stored
relations (in domain calculus).
• In a calculus expression, there is no order of
operations to specify how to retrieve the query
result—a calculus expression specifies only what
information the result should contain.
• This is the main distinguishing feature between relational
algebra and relational calculus.
ICS 541 - 01 (072)
Relational Data Model 1
100
Relational Calculus
• Relational calculus is considered to be a
nonprocedural language.
• This differs from relational algebra, where we must
write a sequence of operations to specify a retrieval
request; hence relational algebra can be considered as
a procedural way of stating a query.
ICS 541 - 01 (072)
Relational Data Model 1
101
Tuple Relational Calculus
• The tuple relational calculus is based on specifying a number
of tuple variables.
• Each tuple variable usually ranges over a particular database
relation, meaning that the variable may take as its value any
individual tuple from that relation.
• A simple tuple relational calculus query is of the form
{t | COND(t)}
• where t is a tuple variable and COND (t) is a conditional expression
involving t.
• The result of such a query is the set of all tuples t that satisfy COND (t).
ICS 541 - 01 (072)
Relational Data Model 1
102
Tuple Relational Calculus
• Example: To find the first and last names of all employees
whose salary is above $50,000, we can write the following
tuple calculus expression:
{t.FNAME, t.LNAME | EMPLOYEE(t) AND
t.SALARY>50000}
• The condition EMPLOYEE(t) specifies that the range
relation of tuple variable t is EMPLOYEE.
• The first and last name (PROJECTION FNAME, LNAME) of each
EMPLOYEE tuple t that satisfies the condition
t.SALARY>50000 (SELECTION  SALARY >50000) will be
retrieved.
ICS 541 - 01 (072)
Relational Data Model 1
103
The Existential and Universal Quantifiers
• Two special symbols called quantifiers can appear in formulas;
these are the universal quantifier () and the existential
quantifier ().
• Informally, a tuple variable t is bound if it is quantified,
meaning that it appears in an ( t) or ( t) clause; otherwise, it
is free.
• If F is a formula, then so are ( t)(F) and ( t)(F), where t is a
tuple variable.
• The formula ( t)(F) is true if the formula F evaluates to true for some
(at least one) tuple assigned to free occurrences of t in F; otherwise (
t)(F) is false.
• The formula ( t)(F) is true if the formula F evaluates to true for every
tuple (in the universe) assigned to free occurrences of t in F; otherwise
( t)(F) is false.
ICS 541 - 01 (072)
Relational Data Model 1
104
The Existential and Universal Quantifiers
  is called the universal or “for all” quantifier
because every tuple in “the universe of” tuples must
make F true to make the quantified formula true.
  is called the existential or “there exists” quantifier
because any tuple that exists in “the universe of”
tuples may make F true to make the quantified
formula true.
ICS 541 - 01 (072)
Relational Data Model 1
105
Example Query Using Existential Quantifier
• Retrieve the name and address of all employees who work for the
‘Research’ department. The query can be expressed as :
{t.FNAME, t.LNAME, t.ADDRESS | EMPLOYEE(t) and ( d)
(DEPARTMENT(d) and d.DNAME=‘Research’ and
d.DNUMBER=t.DNO) }
• The only free tuple variables in a relational calculus expression should be
those that appear to the left of the bar ( | ).
• In above query, t is the only free variable; it is then bound successively to each
tuple.
• If a tuple satisfies the conditions specified in the query, the attributes
FNAME, LNAME, and ADDRESS are retrieved for each such tuple.
• The conditions EMPLOYEE (t) and DEPARTMENT(d) specify the range
relations for t and d.
• The condition d.DNAME = ‘Research’ is a selection condition and corresponds
to a SELECT operation in the relational algebra, whereas the condition
d.DNUMBER = t.DNO is a JOIN condition.
ICS 541 - 01 (072)
Relational Data Model 1
106
Example Query Using Universal Quantifier
•
Find the names of employees who work on all the projects controlled by
department number 5. The query can be:
{e.LNAME, e.FNAME | EMPLOYEE(e) and ( ( x)(not(PROJECT(x)) or
not(x.DNUM=5)
OR ( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO))))}
• Exclude from the universal quantification all tuples that we are not interested in by
making the condition true for all such tuples.
•
•
The first tuples to exclude (by making them evaluate automatically to true) are those that
are not in the relation R of interest.
In query above, using the expression not(PROJECT(x)) inside the universally
quantified formula evaluates to true all tuples x that are not in the PROJECT
relation.
•
Then we exclude the tuples we are not interested in from R itself. The expression
not(x.DNUM=5) evaluates to true all tuples x that are in the project relation but are not
controlled by department 5.
• Finally, we specify a condition that must hold on all the remaining tuples in R.
( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO)
ICS 541 - 01 (072)
Relational Data Model 1
107
Languages Based on Tuple Relational Calculus
• The language SQL is based on tuple calculus. It uses the basic
block structure to express the queries in tuple calculus:
• SELECT <list of attributes>
• FROM <list of relations>
• WHERE <conditions>
• SELECT clause mentions the attributes being projected, the
FROM clause mentions the relations needed in the query, and
the WHERE clause mentions the selection as well as the join
conditions.
• SQL syntax is expanded further to accommodate other operations. (See
Chapter 8).
ICS 541 - 01 (072)
Relational Data Model 1
108
Languages Based on Tuple Relational Calculus
• Another language which is based on tuple calculus is
QUEL which actually uses the range variables as in
tuple calculus. Its syntax includes:
• RANGE OF <variable name> IS <relation name>
• Then it uses
• RETRIEVE <list of attributes from range variables>
• WHERE <conditions>
• This language was proposed in the relational DBMS
INGRES.
ICS 541 - 01 (072)
Relational Data Model 1
109
The Domain Relational Calculus
• Another variation of relational calculus called the domain relational
calculus, or simply, domain calculus is equivalent to tuple calculus and to
relational algebra.
• The language called QBE (Query-By-Example) that is related to domain
calculus was developed almost concurrently to SQL at IBM Research,
Yorktown Heights, New York.
• Domain calculus was thought of as a way to explain what QBE does.
• Domain calculus differs from tuple calculus in the type of variables used in
formulas:
• Rather than having variables range over tuples, the variables range over single
values from domains of attributes.
• To form a relation of degree n for a query result, we must have n of these
domain variables— one for each attribute.
ICS 541 - 01 (072)
Relational Data Model 1
110
The Domain Relational Calculus
• An expression of the domain calculus is of the form
{ x1, x2, . . ., xn |
COND(x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m)}
• where x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m are domain
variables that range over domains (of attributes)
• and COND is a condition or formula of the domain
relational calculus.
ICS 541 - 01 (072)
Relational Data Model 1
111
Example Query Using Domain Calculus
• Retrieve the birthdate and address of the employee whose name is ‘John B.
Smith’.
• Query :
{uv | ( q) ( r) ( s) ( t) ( w) ( x) ( y) ( z)
(EMPLOYEE(qrstuvwxyz) and q=’John’ and r=’B’ and s=’Smith’)}
• Ten variables for the employee relation are needed, one to range over the
domain of each attribute in order.
• Of the ten variables q, r, s, . . ., z, only u and v are free.
• Specify the requested attributes, BDATE and ADDRESS, by the free
domain variables u for BDATE and v for ADDRESS.
• Specify the condition for selecting a tuple following the bar ( | )—
• namely, that the sequence of values assigned to the variables qrstuvwxyz be a
tuple of the employee relation and that the values for q (FNAME), r (MINIT),
and s (LNAME) be ‘John’, ‘B’, and ‘Smith’, respectively.
ICS 541 - 01 (072)
Relational Data Model 1
112
Part II Summary
• Relational Algebra
•
•
•
•
•
Unary Relational Operations
Relational Algebra Operations From Set Theory
Binary Relational Operations
Additional Relational Operations
Examples of Queries in Relational Algebra
• Relational Calculus
• Tuple Relational Calculus
• Domain Relational Calculus
ICS 541 - 01 (072)
Relational Data Model 1
113
3. Implementation
SQL 99 Standard
ICS 541 - 01 (072)
Relational Data Model 1
114
SQL Standard
•
•
•
•
•
SQL: Structured Query Language
Pronounced “S-Q-L” or “sequel”
SQL was specified in 1970s
SQL was enhanced substantially in 1989 and 1992
A new standard called SQL3 added object-oriented
features
• A subset of SQL3 standard, now known as SQL-99
has been approved
ICS 541 - 01 (072)
Relational Data Model 1
115
SQL
• DDL statements
• CREATE, ALTER, DROP
• SCHEMA, TABLE, VIEW, CONSTRAINT, TRIGGER
• Update
• INSERT, DELETE, UPDATE
• DML
• SELECT
ICS 541 - 01 (072)
Relational Data Model 1
116
Rules for NULL’s
• When we operate on a NULL and another value (including
another NULL) using +, –, etc., the result is NULL
• Aggregate functions ignore NULL, except COUNT(*)
(since it counts rows)
• When we compare a NULL with another value (including
another NULL) using =, >, etc., the result is UNKNOWN.
• WHERE and HAVING clauses only select rows for output
if the condition evaluates to TRUE. UNKNOWN is
insufficient
ICS 541 - 01 (072)
Relational Data Model 1
117
Unfortunate consequences
• SELECT AVG(GPA) FROM Student;
• SELECT SUM(GPA)/COUNT(*) FROM Student;
• Not equivalent
• Although AVG(GPA)= SUM(GPA)/COUNT(GPA), still
• SELECT * FROM Student;
• SELECT * FROM Student WHERE GPA = GPA;
• Not equivalent
• Be careful: NULL breaks many equivalences
ICS 541 - 01 (072)
Relational Data Model 1
118
Another problem
• Example: Who has NULL GPA values?
• SELECT * FROM Student WHERE GPA = NULL;
• Does not work; never returns anything
• (SELECT * FROM Student)
EXCEPT ALL
(SELECT * FROM Student WHERE GPA = GPA)
• Works, but ugly
• Introduced built-in predicates IS NULL and IS NOT NULL
• SELECT * FROM Student WHERE GPA IS NULL;
ICS 541 - 01 (072)
Relational Data Model 1
119
E F Codd’s Relational Model of Data (1)
•
Relational model and normal form
•
•
Data independence
Data Dependencies
•
•
•
Ordering dependencies
Indexing dependencies
Access path dependencies
ICS 541 - 01 (072)
Relational Data Model 1
120
E F Codd’s Relational Model of Data (2)
• Relational model and normal form
• Relational view of data
•
•
•
•
•
•
Relation
Properties of a relation
Each row represents an n-tuple of R.
The ordering of rows is immaterial.
All rows are distinct.
The ordering of columns is significant--it corresponds to the
ordering S1, S2, - . - , Sn of the domains on which R is defined
• The significance of each column is partially conveyed by labeling it
with the name of the corresponding domain.
ICS 541 - 01 (072)
Relational Data Model 1
121
E F Codd’s Relational Model of Data (3)
• Relational view of data
• Normal form
• Normalization
• Some linguistic aspects
• Expressible, named, and stored relations
• Named set is the collection of all those relations that the
community of users can identify by means of a simple name (or
identifier)
• Expressible set is the total collection of relations that can be
designated by expressions in the data language
• Named set is a subset of the expressible set --- usually a very small
subset
• Stored relations
ICS 541 - 01 (072)
Relational Data Model 1
122
E F Codd’s Relational Model of Data (4)
•
Redundancy and consistency
•
Operations on relations
1.
2.
3.
4.
5.
•
Redundancy
•
•
•
Permutations
Projection
Join
Composition
Restriction
Strong redundancy
Weak redundancy
Consistency
ICS 541 - 01 (072)
Relational Data Model 1
123
E F Codd’s Relational Model of Data (5)
•
Homework assignment
Read the paper carefully and develop answers
to the following questions:
1. What is data dependency? Explain three forms of
data redundancies discussed in the paper.
2. Explain each of the five operators with an example.
3. What is redundancy? How strong redundancy is
different than weak redundancy?
4. Discuss the idea of consistency as explained in the
paper.
•
Due: Saturday 8 March 2008 (Typed printed copy in
class)
ICS 541 - 01 (072)
Relational Data Model 1
124
Summary
• Relational data model
1. Data Management
•
•
•
•
Informal definitions
Formal definitions
Integrity constraints
Update operations
2. Data Manipulation
• Relational algebra
• Relational calculus
• Tuple calculus
• Domain calculus
3. Implementation
• SQL
ICS 541 - 01 (072)
Relational Data Model 1
125