Download The Relational Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Relational Data Model
NOTES - PART III
The Relational Data Model
(Based on Chapter 7 in
Fundamentals of Database Systems by Elmasri and Navathe, Ed. 3)
1 Relational Model Concepts
2 Characteristics of Relations
3 Relational Integrity Constraints
3.1 Key Constraints
3.2 Entity Integrity Constraints
3.3 Referential Integrity Constraints
4 Update Operations on Relations
5 Relational Algebra Operations
5.1 SELECT  and PROJECT 
5.2 Set Operations
5.3 JOIN Operations
5.4 Additional Relational Operations
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-1
Relational Data Model
1. Relational Model Concepts
BASIS OF THE MODEL
• The relational Model of Data is based on the concept
of a Relation.
• A Relation is a mathematical concept based on the
ideas of sets.
• The strength of the relational approach to data
management comes from the formal foundation provided
by the theory of relations.
• We review the essentials of the relational approach in
this chapter.
__________________________________________________________
1. The relational model is covered in Chapter 7 of the
book Fundamentals of Database Systems, by Elmasri
and S.B. Navathe, Ed. III, 2000.
2. The model was first proposed by Dr. E.F. Codd of
IBM in 1970 in the following paper - "A Relational
Model for Large Shared Data Banks," Communications
of the ACM, June 1970.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-2
Relational Data Model
INFORMAL DEFINITIONS
• RELATION: A table of values
• A relation may be thought of as a set of rows.
• A relation may alternately be though of as a set of
columns.
• Each row of the relation may be given an identifier.
• Each column typically is called by its column name or
column header or attribute name.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-3
Relational Data Model
FORMAL DEFINITIONS
• A Relation may be defined in multiple ways.
• The Schema of a Relation:
R (A1, A2, .....An)
Relation R is defined over attributes A1, A2, .....An
For Example CUSTOMER (Cust-id, Cust-name, Address, Phone#)
Here, CUSTOMER is a relation defined over the four
attributes Cust-id, Cust-name, Address, Phone#, each of
which has a domain or a set of valid values. For
example, the domain of Cust-id is 6 digit numbers.
• A tuple is an ordered set of values
• Each value is derived from an appropriate domain.
• Each row in the CUSTOMER table may be called as a
tuple in the table and would consist of four values.
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a
triple belonging to the CUSTOMER relation.
• A relation may be regarded as a set of tuples (rows).
• Columns in a table are also called as attributes of the
relation.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-4
Relational Data Model
FORMAL DEFINITIONS (contd.)
• The relation is formed over the cartesian product of
the sets; each set has values from a domain; that domain
is used in a specific role which is conveyed by the
attribute name.
• For example, attribute Cust-name is defined over the
domain of strings of 25 characters. The role these
strings play in the CUSTOMER relation is that of the
name of customers.
• Formally,
Given R(A1, A2, .........., An)
r(R) subset-of dom (A1) X dom (A2) X ....X dom(An)
R: schema of the relation
r of R: a specific "value" or population of R.
• R is also called the intension of a relation
r is also called the extension of a relation
Let S1 = {0,1}
Let S2 = {a,b,c}
Let R subset-of S1 X S2
for example: r(R) = {<0.a> , <0,b> , <1,c> }
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-5
Relational Data Model
DEFINITION SUMMARY
Informal Terms
Formal Terms
Table
Column
Row
Values in a column
Table Definition
Populated Table
Relation
Attribute/Domain
Tuple
Domain
Schema of Relation
Extension
Notes:
Whereas languages like SQL use the informal terms of
TABLE (e.g. CREATE TABLE), COLUMN (e.g.
SYSCOLUMN variable), the relational database
textbooks present the model and operations on it using
the formal terms.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-6
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-7
Relational Data Model
2 Characteristics of Relations
Ordering of tuples in a relation r(R): The tuples are not
considered to be ordered, even though they appear to be in the
tabular form.
Ordering of attributes in a relation schema R (and of values
within each tuple): We will consider the attributes in R(A1,
A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .
(However, a more general alternative definition of relation
does not require this ordering).
Values in a tuple: All values are considered atomic
(indivisible). A special null value is used to represent values
that are unknown or inapplicable to certain tuples.
Notation:
- We refer to component values of a tuple t by t[Ai] = vi (the
value of attribute Ai for tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t
containing the values of attributes Au, Av, ..., Aw,
respectively.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-8
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-9
Relational Data Model
3 Relational Integrity Constraints
Constraints are conditions that must hold on all valid relation
instances. There are three main types of constraints:
Key constraints, entity integrity constraints, and referential
integrity constraints
3.1 Key Constraints
Superkey of R: A set of attributes SK of R such that no two
tuples in any valid relation instance r(R) will have the same
value for SK. That is, for any distinct tuples t1 and t2 in r(R),
t1[SK] <> t2[SK].
Key of R: A "minimal" superkey; that is, a superkey K such
that removal of any attribute from K results in a set of
attributes that is not a superkey.
Example: The CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
has two keys Key1 = {State, Reg#}, Key2 = {SerialNo},
which are also superkeys. {SerialNo, Make} is a superkey but
not a key.
If a relation has several candidate keys, one is chosen
arbitrarily to be the primary key. The primary key attributes
are underlined.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-10
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-11
Relational Data Model
3.2 Entity Integrity
Relational Database Schema: A set S of relation schemas that
belong to the same database. S is the name of the database.
S = {R1, R2, ..., Rn}
Entity Integrity: The primary key attributes PK of each
relation schema R in S cannot have null values in any tuple of
r(R). This is because primary key values are used to identify
the individual tuples.
t[PK] <> null for any tuple t in r(R)
Note: Other attributes of R may be similarly constrained to
disallow null values, even though they are not members of the
primary key.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-12
Relational Data Model
3.3 Referential Integrity
A constraint involving two relations (the previous constraints
involve a single relation).
Used to specify a relationship among tuples in two relations:
the referencing relation and the referenced relation.
Tuples in the referencing relation R1 have attributes FK
(called foreign key attributes) that reference the primary key
attributes PK of the referenced relation R2. A tuple t1 in R1 is
said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a
relational database schema as a directed arc from R1.FK to R2.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-13
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-14
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-15
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-16
Relational Data Model
4 Update Operations on Relations
- INSERT a tuple.
- DELETE a tuple.
- MODIFY a tuple.
- Integrity constraints should not be violated by the update
operations.
- Several update operations may have to be grouped together.
- Updates may propagate to cause other updates
automatically. This may be necessary to maintain integrity
constraints.
- In case of integrity violation, several actions can be taken:
- cancel the operation that causes the violation (REJECT
optiom)
- perform the operation but inform the user of the violation
- trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
- execute a user-specified error-correction routine
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-17
Relational Data Model
5 The Relational Algebra
- Operations to manipulate relations.
- Used to specify retrieval requests (queries).
- Query result is in the form of a relation.
Relational Operations:
5.1 SELECT  and PROJECT  operations.
5.2 Set operations: These include UNION U,
INTERSECTION | |, DIFFERENCE -, CARTESIAN
PRODUCT X.
5.3 JOIN operations X.
5.4 Other relational operations: DIVISION, OUTER JOIN,
AGGREGATE FUNCTIONS.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-18
Relational Data Model
5.1 SELECT  and PROJECT 
SELECT operation (denoted by ):
- Selects the tuples (rows) from a relation R that satisfy a
certain selection condition c
- Form of the operation:  c(R)
- The condition c is an arbitrary Boolean expression on the
attributes of R
- Resulting relation has the same attributes as R
- Resulting relation includes each tuple in r(R) whose attribute
values satisfy the condition c
Examples:
 DNO=4(EMPLOYEE)
 SALARY>30000(EMPLOYEE)
(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-19
Relational Data Model
PROJECT operation (denoted by ):
- Keeps only certain attributes (columns) from a relation R
specified in an attribute list L
- Form of operation:  L(R)
- Resulting relation has only those attributes of R specified in
L
Example:  FNAME,LNAME,SALARY(EMPLOYEE)
- The PROJECT operation eliminates duplicate tuples in the
resulting relation so that it remains a mathematical set (no
duplicate elements)
Example: SEX,SALARY(EMPLOYEE)
If several male employees have salary 30000, only a single
tuple <M, 30000> is kept in the resulting relation.
Duplicate tuples are eliminated by the  operation.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-20
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-21
Relational Data Model
Sequences of operations:
- Several operations can be combined to form a relational
algebra expression (query)
Example: Retrieve the names and salaries of employees who
work in department 4:
 FNAME,LNAME,SALARY ( DNO=4(EMPLOYEE) )
- Alternatively, we specify explicit intermediate relations for
each step:
DEPT4_EMPS <- DNO=4(EMPLOYEE)
R <- FNAME,LNAME,SALARY(DEPT4_EMPS)
- Attributes can optionally be renamed in the resulting lefthand-side relation (this may be required for some operations
that will be presented later):
DEPT4_EMPS <- DNO=4(EMPLOYEE)
R(FIRSTNAME,LASTNAME,SALARY) < FNAME,LNAME,SALARY(DEPT4_EMPS)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-22
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-23
Relational Data Model
5.2 Set Operations
- Binary operations from mathematical set theory:
UNION: R1 U R2,
INTERSECTION: R1 | | R2,
SET DIFFERENCE: R1 - R2,
CARTESIAN PRODUCT: R1 X R2.
- For U, | |, -, the operand relations R1(A1, A2, ..., An) and
R2(B1, B2, ..., Bn) must have the same number of attributes,
and the domains of corresponding attributes must be
compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This
condition is called union compatibility.
- The resulting relation for U, | |, or - has the same attribute
names as the first operand relation R1 (by convention).
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-24
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-25
Relational Data Model
CARTESIAN PRODUCT
R(A1, A2, ..., Am, B1, B2, ..., Bn) <R1(A1, A2, ..., Am) X R2 (B1, B2, ..., Bn)
- A tuple t exists in R for each combination of tuples t1 from
R1 and t2 from R2 such that:
t[A1, A2, ..., Am]=t1 and t[B1, B2, ..., Bn]=t2
- If R1 has n1 tuples and R2 has n2 tuples, then R will have
n1*n2 tuples.
- CARTESIAN PRODUCT is a meaningless operation on its
own. It can combine related tuples from two relations if
followed by the appropriate SELECT operation .
Example: Combine each DEPARTMENT tuple with the
EMPLOYEE tuple of the manager.
DEP_EMP <-DEPARTMENT X EMPLOYEE
DEPT_MANAGER <-MGRSSN=SSN(DEP_EMP)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-26
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-27
Relational Data Model
5.3 JOIN Operations
THETA JOIN: Similar to a CARTESIAN PRODUCT
followed by a SELECT. The condition c is called a join
condition.
R(A1, A2, ..., Am, B1, B2, ..., Bn) <R1(A1, A2, ..., Am) X c R2 (B1, B2, ..., Bn)
EQUIJOIN: The join condition c includes one or more
equality comparisons involving attributes from R1 and R2.
That is, c is of the form:
(Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN operation:
Ai, ..., Ah are called the join attributes of R1
Bj, ..., Bk are called the join attributes of R2
Example of using EQUIJOIN:
Retrieve each DEPARTMENT's name and its manager's name:
T <-DEPARTMENT XMGRSSN=SSN EMPLOYEE
RESULT <-DNAME,FNAME,LNAME(T)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-28
Relational Data Model
NATURAL JOIN (*):
In an EQUIJOIN R <- R1 X c R2, the join attribute of R2
appear redundantly in the result relation R. In a NATURAL
JOIN, the redundant join attributes of R2 are eliminated from
R. The equality condition is implied and need not be specified.
R <- R1 *(join attributes of R1),(join attributes of R2) R2
Example: Retrieve each EMPLOYEE's name and the name of
the DEPARTMENT he/she works for:
T<- EMPLOYEE *(DNO),(DNUMBER) DEPARTMENT
RESULT <-FNAME,LNAME,DNAME(T)
If the join attributes have the same names in both relations,
they need not be specified and we can write R <- R1 * R2.
Example: Retrieve each EMPLOYEE's name and the name of
his/her SUPERVISOR:
SUPERVISOR(SUPERSSN,SFN,SLN)<SSN,FNAME,LNAME(EMPLOYEE)
T<-EMPLOYEE * SUPERVISOR
RESULT <-FNAME,LNAME,SFN,SLN(T)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-29
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-30
Relational Data Model
Note: In the original definition of NATURAL JOIN, the join
attributes were required to have the same names in both
relations.
There can be a more than one set of join attributes with a
different meaning between the same two relations. For
example:
JOIN ATTRIBUTES
EMPLOYEE.SSN=
DEPARTMENT.MGRSSN
EMPLOYEE.DNO=
DEPARTMENT.DNUMBER
RELATIONSHIP
EMPLOYEE manages
the DEPARTMENT
EMPLOYEE works for
the DEPARTMENT
Example: Retrieve each EMPLOYEE's name and the name of
the DEPARTMENT he/she works for:
T<-EMPLOYEE XDNO=DNUMBERDEPARTMENT
RESULT <-FNAME,LNAME,DNAME(T)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-31
Relational Data Model
A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES
EMPLOYEE(1).SUPERSSN=
EMPLOYEE(2).SSN
RELATIONSHIP
EMPLOYEE(2) supervises
EMPLOYEE(1)
- One can think of this as joining two distinct copies of the
relation, although only one relation actually exists
- In this case, renaming can be useful
Example: Retrieve each EMPLOYEE's name and the name of
his/her SUPERVISOR:
SUPERVISOR(SSSN,SFN,SLN)<SSN,FNAME,LNAME(EMPLOYEE)
T<-EMPLOYEE XSUPERSSN=SSSNSUPERVISOR
RESULT <-FNAME,LNAME,SFN,SLN(T)
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-32
Relational Data Model
Complete Set of Relational Algebra Operations:
- All the operations discussed so far can be described as a
sequence of only the operations SELECT, PROJECT,
UNION, SET DIFFERENCE, and CARTESIAN
PRODUCT.
- Hence, the set { , , U, - , X } is called a complete set of
relational algebra operations. Any query language equivalent
to these operations is called relationally complete.
- For database applications, additional operations are needed
that were not part of the original relational algebra. These
include:
1. Aggregate functions and grouping.
2. OUTER JOIN and OUTER UNION.
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-33
Relational Data Model
5.4 Additional Relational Operations
AGGREGATE FUNCTIONS
- Functions such as SUM, COUNT, AVERAGE, MIN, MAX
are often applied to sets of values or sets of tuples in database
applications
<grouping attributes> F<function list> (R)
- The grouping attributes are optional
Example 1: Retrieve the average salary of all employees (no
grouping needed):
R(AVGSAL) <- F AVERAGE SALARY (EMPLOYEE)
Example 2: For each department, retrieve the department
number, the number of employees, and the average salary (in
the department):
R(DNO,NUMEMPS,AVGSAL) <DNO F COUNT SSN, AVERAGE SALARY (EMPLOYEE)
DNO is called the grouping attribute in the above example
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-34
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-35
Relational Data Model
OUTER JOIN
- In a regular EQUIJOIN or NATURAL JOIN operation,
tuples in R1 or R2 that do not have matching tuples in the
other relation do not appear in the result
- Some queries require all tuples in R1 (or R2 or both) to
appear in the result
- When no matching tuples are found, nulls are placed for the
missing attributes
- LEFT OUTER JOIN: R1 X R2 lets every tuple in R1
appear in the result
- RIGHT OUTER JOIN: R1 X R2 lets every tuple in R2
appear in the result
- FULL OUTER JOIN: R1 X R2 lets every tuple in R1 or
R2 appear in the result
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-36
Relational Data Model
Fundamentals of Database Systems
© Ramez Elmasri and Shamkant B. Navathe
3-37