Download Mini-MSDD - Relational Databases

Document related concepts

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Mini-MSDD
Relational
Databases
Thomas P. Sturm
University of St. Thomas
Outline
Data Concepts
Relational Model
Normalization
Logical Data Structures
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
2
Data Concepts
Why learn about Relational Databases?
Data - Information - Database
Properties of Data Items
Entity, Attribute, Value
Descriptors / Identifiers
Data Base Design
Sample Database
CCopyright © 1971-2002 Thomas P. Sturm
Data Concepts
3
Why learn about Relational Databases?







A way to put end users into direct touch
with the information stored in computers.
A way to increase the productivity of data
processing professionals.
Can obtain high-performance
implementation of relational models
“No surprises” theoretical underpinnings
(no “special rules, no “that’s a feature, not a
bug”)
Universal acceptance from the smallest to
the largest databases
Readily available design tools
A standardize language for doing queries
(SQL)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
4
Data - Information - Database
INFORMATION:
The meaning that a human assigns to data via the
known conventions used in their representation.
DATA:
A formalized representation of facts, concepts, or
instructions suitable for communication, interpretation, or
processing by human or automatic means.
BASE:
The bottom of anything, considered as its support or
foundation
The fundamental part of a thing
The chief ingredient of anything, viewed as its
fundamental constituent
Base in its most general sense equals bottom, but, more
specifically, implies a broad bottom by which something is
held up or stabilized
DATABASE:
A collection of stored operational data used by the
application systems of some particular enterprise.
A stable foundation to support an information process.
CCopyright © 1971-2002 Thomas P. Sturm
Data Concepts
5
Properties of Data Items


There are things about which data is collected entities
These entities can optionally have a name or names
(both a class/type name and individual/instance
names)
Entity Type: A category, arbitrarily defined (but agreed to)
so that membership within the category can be
established, at least at a point in time, e.g. a
department
Entity Instance: Occurrence of a member in the category
in the world, e.g. the payroll department




There are certain things that it is desirable to describe
about the entities. The various qualities
(characteristics) of the entity that are to be
described are referred to as attributes
For each of these attributes for each entity there is
potentially a value (taken from a legal set of values
that obey certain constraints or rules)
There is some structure in the data or stored values
(relationships, associations, dependencies)
Most important, the stored data items must have
meaning
Ref. Thomas P. Sturm, Data and File Structures
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
6
Entity, Attribute, Value
EXAMPLE:
For the entity “the car I drove on my first
Sabbatical leave”
ATTRIBUTE
Manufacturer
Model
Body Type
Model Year
Color
Owner
Class
License Number
Licensing State
VALUE
Ford
Country Sedan
Station Wagon
1973
Blue
Thomas P. Sturm
Passenger Car
NBGO
Minnesota
Ref. Thomas P. Sturm, Data and File Structures
CCopyright © 1971-2002 Thomas P. Sturm
Data Concepts
7
Descriptors / Identifiers
DESCRIPTOR:
A descriptor for an entity is an attribute/value
pair.
IDENTIFIER:
An identifier is an attribute whose value is
different for each entity.


usually relegated to values necessarily different
where necessary, an identifier can be made up of the
concatenation of two attributes (which should be
thought of as yet another attribute)
RETRIEVAL
can be based on:



identifier (for an identifier, find some descriptors)
descriptor (for a descriptor, find some identifiers of
entities possessing the descriptor)
location (for a particular location, retrieve the data
that is stored there)
absolute location
 relative location
(This third method of access is not allowable in the
relational model)

Copyright © 1971-2002 Thomas P. Sturm Relational Databases
8
Data Base Design




Impossible to model all of reality
Select an appropriate subset of entities, attributes for
those entities, values for those attributes
Select which interrelationships to preserve
Abstract entities and relationships into classes in a
way suitable for




machine representation
human interpretation
Organize, code, and structure the stored data
Create convenient access paths
User
|
Model
|
...
|
Model
|
Disk Pack
suitable for human
interpretation
suitable for machine
representation
- sufficiently abstract to allow minor perturbations
- sufficiently powerful to give some understanding about
how data in the world are related
Ref: Thomas P. Sturm, Data and File Structures
CCopyright © 1971-2002 Thomas P. Sturm
Data Concepts
9
Sample Database - Employees
Overview of Content:
The database contains organization, budget,
and scheduling information for a software
group that is developing an academic
information system
Entities:
Employees - who have








a name
a job title
a manager who, in turn, is an employee
a hire date
an hourly billing rate
(possibly) a dollar annual bonus amount
membership in a department which in turn has a name,
location, and budget
a set of assigned tasks on projects


each task by each employee on each project has a time
estimate in hours
each project has a name, description, budget, and due
date
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
10
Sample Database - Departments and
Projects
Entities (continued)
Departments - which have





a department number
a department name
a department location (room number)
an annual dollar budget
employees, who in turn have a name, job description,
manager, hire date, hourly rate annual bonus, and a
set of assigned tasks (as described above)
Projects - which have





a project name
a project description
a project budget
a project due date
a set of tasks, each of which is to be performed by one
or more employees (who in turn have a name, job
description, manager, hire date, ...) with a time
estimate for each employee for each task
CCopyright © 1971-2002 Thomas P. Sturm
Data Concepts
11
Sample Database - Tasks
Entities (continued)
Tasks - each of which have




the name of the employee working on the task (who in
turn has name, job description, ...)
the name of the project that the task is related to
(which in turn has name, description, ...)
the name of the task being performed
the time estimate (in hours) of how long an employee
will work on a particular type of task for a
particular project
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
12
Sample Data
(stated in relational form)
Employees - (Table name emp)
Ename
allen
barger
jones
king
martin
olson
pearson
radl
rogers
smith
sturm
thomas
turner
vogel
Job
programmer
supervisor
programmer
clerk
programmer
analyst
programmer
supervisor
programmer
programmer
clerk
analyst
supervisor
consultant
Mgr
barger
turner
radl
barger
barger
radl
radl
turner
barger
barger
radl
barger
turner
Hired
09-jun-1991
23-jan-1993
20-feb-1991
22-feb-1991
09-nov-1991
28-apr-1991
01-may-1991
03-dec-1992
08-sep-1992
17-dec-1990
23-sep-1992
03-dec-1992
02-mar-1991
17-nov-1991
Rate
30.00
65.00
35.00
18.00
25.00
55.00
30.00
65.00
25.00
35.00
18.00
50.00
75.00
80.00
Bonus
550.00
0.00
600.00
0.00
1000.00
DeptNo
402
402
401
402
402
401
401
401
402
402
401
402
400
400
Departments (Table name dept)
DeptNo
400
401
402
403
Dname
programming
financial
academic
support
Loc
200
200
100
300
Dbudget
150000.00
275000.00
390000.00
7000.00
Projects (Table name proj)
Project_id
admit
alumni
billing
budget
payroll
records
Description
Admissions
Alumni development
Student billing
Budgeting
Payroll
Students records
CCopyright © 1971-2002 Thomas P. Sturm
Pbudget
15000.00
7500.00
11000.00
12500.00
9000.00
6000.00
Due_date
07-apr-1998
30-jan-1999
30-jan-1998
12-mar-1998
15-may-1998
11-feb-1998
Data Concepts
13
Tasks (Table name task)
Ename
allen
allen
allen
allen
barger
barger
barger
barger
jones
jones
jones
king
king
king
martin
olson
olson
olson
olson
pearson
pearson
pearson
radl
radl
radl
radl
rogers
rogers
rogers
smith
smith
smith
sturm
sturm
sturm
sturm
thomas
thomas
thomas
thomas
turner
turner
Project_id
admit
admit
billing
billing
admit
alumni
billing
records
billing
budget
payroll
admit
alumni
records
admit
admit
alumni
billing
records
budget
budget
payroll
billing
billing
budget
payroll
records
records
records
alumni
alumni
billing
billing
budget
budget
payroll
alumni
billing
budget
payroll
billing
budget
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
Tname
debug
implement
debug
implement
manage
manage
manage
manage
implement
implement
debug
clerical
clerical
clerical
implement
design
design
design
design
debug
implement
implement
design
manage
manage
manage
debug
design
implement
debug
implement
implement
clerical
clerical
debug
clerical
design
design
design
design
manage
design
Hours
25
20
30
20
15
10
8
12
35
70
40
25
9
15
30
75
40
20
45
40
60
80
15
10
15
20
20
30
45
30
90
40
38
20
20
15
5
45
40
70
12
45
14
Relational Model
Relational Database model
Conceptual Idea of a Relation
Translation of Relational Terms
Requirements of a Relation
Advantages of the Relational Model
Differences in the Relational Model
Details of Department Relation
Org. of Relations in Sample D.B.
(Single Table) Relational Operations
Relational Algebra
Two Table Relational Operations
Cartesian Product
Joins
Relational Database Model
“Codd's” Model
E. F. (Ted) Codd, CACM V13 #6 (June, 1970),
pp. 377-87. “A Relational Model of Data
for Large Shared Data Banks”
Developed in mid-1970’s
Based on the mathematical theory of relations
Codd's definition:
Given sets S1, S2, ... , Sn (not necessarily distinct), R is a
relation on these n sets if it is a set of n-tuples each
of which has its first element from S1, its second
element from S2, and so on.
We shall refer to Sj as the jth domain of R.
R is said to have degree n.
If R has m n-tuples (or just tuples), R is said to have
cardinality m.
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
16
Conceptual Idea of a Relation
Conceptual (but not physical) ideas:
- A relation is a table or a flat file
with n columns or fields
and m rows or records
- Column (or field) j represents a set of values (from a
possible set of values, Sj, the “domain”) for a
particular attribute of all the entities
- Each row (or record represents a set of values for an
entity, one for each attribute (column, field)
- Degree - number of columns (fields, domains)
- Cardinality - number of rows (records, entities, tuples)
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
17
Translation of Relational Terms
Relational
Loose
Term
Equivalent
Relation
Tuple
Degree
Cardinality
Table
Row
# of attributes
# of table entries
Domain
field-level edit criteria
and integrity constraints
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
18
Requirements of a Relation
All rows of the relation must have the same
attributes in the same order
No repeating groups
Each row must be unique
(No duplicate rows - if there are, they are “cast
out”)
A set of columns that forms an identifier is the
table key
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
19
Advantages of the Relational Model
Logical not physical model
-
easy to communicate, what not how
Data Independence
-
implementation independent
Record interconnections are dynamically
generated based on data value
-
(no user-visible navigation links)
Set-at-a-time database operations (relational
operators) locate, permute, join, select,
project, derive, order, format, present
Join - the operator that “connects” tables - is
unrestricted
-
it is not necessary to pre-define access paths
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
20
Details of Department Relation
attributes (columns)
DeptNo Dname
400
entities 401
402
403
Loc Dbudget
programming 200 150000
financial
200 275000
academic
support
100 390000
300
7000
domain 1
Copyright © 1971-2002 Thomas P. Sturm
tuple
(row)
domain 4
Relational Model
21
Organization of Relations in Sample
Database
Relation
Attributes (Key underlined)
(Entity type)
emp
(Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo)
dept
(DeptNo, Dname, Loc, Dbudget)
task
(Ename, Project_id, Tname, Hours)
proj
(Project_id, Description, Pbudget, Due_date)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
22
(Single Table) Relational Operations
named file,
view, or
relation
locate relation
boolean
entity
selection
expression
selection
named
attributes
projection
derivation
rules
entry-level
derivations
ordering
specification
order
set-function
specification
file-level
derivations
format,
edit spec.,
destination
formatting &
presentation
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
23
Relational Algebra
Relational operators take one or two relations as
their “operands” or arguments
Result of applying a relational operator to a
relation (or pair of relations) is another
relation
Consequently, relational operators can be used
in sequence to achieve the desired results
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
24
Two Table Relational Operations
Cartesian Product
All rows of the second table appended to all rows of the
first table
No compatibility requirements
Join
A form of parallel table lookup
Both tables must share a domain
Union
All rows of the second table appended to the rows of the
first table
Both tables must have the same domains
Set Difference
All rows of the first table whose keys do not appear as
keys in the second table
Both tables must share the same domains for their keys
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
25
Cartesian Product
If R1 and R2 are relations, the Cartesian product is
written R1  R2 (in relational algebra) or SELECT
* FROM R1, R2; (in SQL)
A new relation is generated that consists of every tuple
in R1 followed by every tuple in R2
relation empl
name
able
baker
codd
date
age
20
40
60
30
relation group
dept
35
45
45
25
dept
35
45
25
loc
100
200
100
Cartesian product empl  group
empl.name
able
able
able
baker
baker
baker
codd
codd
codd
date
date
date
empl.age
20
20
20
40
40
40
60
60
60
30
30
30
empl.dept
35
35
35
45
45
45
45
45
45
25
25
25
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
group.dept
35
45
25
35
45
25
35
45
25
35
45
25
group.loc
100
200
100
100
200
100
100
200
100
100
200
100
26
Join Operation
Form the Cartesian product between two
relations
Cast out duplicates (assuming projection is done
also)
Apply join conditions to select a subset of the
Cartesian product (selection)
There are a variety of different join types,
differentiated by



which relations are used
what the join conditions are
what results are desired
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
27
Natural Join Operation
(Simple join, inner equijoin)
- Start with two different tables, form the Cartesian
product (e.g. empl x group)
empl.name
able
able
able
baker
baker
baker
codd
codd
codd
date
date
date
empl.age
20
20
20
40
40
40
60
60
60
30
30
30
empl.dept
35
35
35
45
45
45
45
45
45
25
25
25
group.dept
35
45
25
35
45
25
35
45
25
35
45
25
group.loc
100
200
100
100
200
100
100
200
100
100
200
100
- Select rows where values of a pair of fields are equal
(e.g. empl.dept and group.dept)
empl.name
able
baker
codd
date
empl.age
20
40
60
30
empl.dept
35
45
45
25
group.dept
35
45
45
25
group.loc
100
200
200
100
- Project all except the duplicated column
empl.name
able
baker
codd
date
empl.age
20
40
60
30
dept
35
45
45
25
group.loc
100
200
200
100
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
28
Expressing the Natural Join
The natural join is written:
empl x group
where empl.dept = group.dept
in the relational algebra
The natural join is written:
SELECT * FROM EMPL, GROUP
WHERE EMPL.DEPT = GROUP.DEPT;
in SQL
The natural join performs a “table lookup”
function by “looking up” data from the
second table for a field in the first table
Unfortunately, if no match is found for an item
“looked up” in the first table, that row in the
first table is “lost”
Copyright © 1971-2002 Thomas P. Sturm
Relational Model
29
Normalization
Normalization Tools
Attributes for a Relational model
Full Functional Dependence
Full Functional Dependence Examples
Normal Form Overview
Universe of Relations
First vs. Second Normal Form
Second vs. Third Normal Form
Third vs. Boyce-Codd Normal Form
Fourth Normal Form
Converting to 4NF
Fifth Normal Form
Converting to 5NF
Domain/Key Normal Form
Enforcing Domain Integrity in DK/NF
Normalization Tools
Decomposition



each document, report, data-flow, etc. is defined to be
a relation
any relations that violate the required normal form are
divided into 2 or more relations that satisfy the
normal form
the resulting relational structure has a relation for each
entity
Construction



this section
next section
identify entities - objects that have attributes,
identifiers, and relationships
assign attributes to the right entities
 attributes apply to all entity-instances
 attributes are fully functionally dependent on the
whole identifier
form “roles” and “intersection entities” where
necessary
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
32
Attributes for a Relational model
Each entity instance has exactly one value for
each attribute (within the scope of the data
model)





atomic
repeating groups are not allowed
vectors are not allowed
pointers and other abstract references are not allowed
values for a particular attribute come from a specified
pool of values (called its domain)
An attribute (or a specific set of attributes) forms
an identifier for each entity instance




if the entity instances are different, so is the value (or
set of values) for the attribute (or set of attributes)
an identifier (key) must be found and cannot have a
null value
there may be more than one, especially since a set of
attributes can be an identifier
must be minimal (cannot discard any attribute without
losing uniqueness)
Copyright © 1971-2002 Thomas P. Sturm
Normalization
33
Full Functional Dependence
If each value of an attribute has associated with
it precisely one value for a second attribute,
then that second attribute is functionally
dependent on the first
Example:
In the emp relation:
 Ename is an identifier, and we choose it as the
primary key
 other attributes of the employee will then generally be
functionally dependent on Ename
 so Job is functionally dependent on Ename (or Ename
functionally determines Job)
All attributes in a relation must necessarily be
functionally dependent on the primary key
Have functional dependency if agreement on the first
value necessarily implies agreement on the
dependent value. But remember that the primary
key can be a set of fields. Full functional
dependence implies that there is no subset of the set
of fields that has functional dependence.
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
34
Full Functional Dependence Examples
Example A:
Job is functionally dependent on (Ename, Mgr)
BUT, it is also true that
Job is functionally dependent on (Ename)
SO
Job is not fully functionally dependent on (Ename, Mgr)
Example B:
Hours is functionally dependent on (Ename, Project_id,
Tname)
AND
Hours is not functionally dependent on
(Ename),
(Project_id),
(Tname),
(Ename, Project_id), (Ename, Tname),
(Project_id, Tname)
SO
Hours is fully functionally dependent on (Ename,
Project_id, Tname)
Copyright © 1971-2002 Thomas P. Sturm
Normalization
35
Normal Form Overview
Universe of All Data
Relations (normalized / unnormalized
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce-Codd Normal Form (BCNF)
4th Normal Form
5th Normal Form (PJ/NF)
Domain/Key
Normal Form
(DK/NF)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
36
Universe of Relations
Any flat file is a relation (0th normal form), but not
necessarily “well formed”
Normalization provides a set of criteria to evaluate the
“well formedness” of a relation (but only one
criteria for determining a “good” form)
In general, a flat file may have repeating groups
Example 1 - suppliers:
part
diode
bulb
suppliers
(GE, TRW, Mot)
(GE, Syl)
Implemented as ?
part
diode
bulb
supplier1
GE
GE
supplier2
TRW
Syl
supplier3
Mot
Eliminate repeating groups by repeating the key to
obtain 1st normal form
Example 1 - suppliers:
part
diode
diode
diode
bulb
bulb
supplier
GE
TRW
Mot
GE
Syl
Copyright © 1971-2002 Thomas P. Sturm
Normalization
37
First vs. Second Normal Form
Example 2 - inventory:
part #
100
100
200
200
300
warehouse #
05
08
05
10
08
wh_address
Mpls
StPaul
Mpls
Madison
StPaul
quantity
200
300
250
400
350
Problems occur because this table is not focused on one
primary key - it is “about” two things - warehouses
and parts in warehouses.
Eliminate the multiple focus of a composite key by
breaking into 2 relations using projection to
obtain 2nd normal form
One table
about
warehouses:
One table about
inventory with
a composite key:
warehouse#
05
08
10
part#
100
100
200
200
300
wh_address
Mpls
StPaul
Madison
warehouse#
05
08
05
10
08
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
quantity
200
300
250
400
350
38
Second vs. Third Normal Form
Example 3 - departments:
name
smith
jones
king
turner
olson
Problem:



dept
402
401
402
400
401
dept_loc
100
200
100
200
200
Functional dependency is transitive
The primary key is name
dept is functionally dependent on name
dept_loc is also functionally dependent on name, but it
is transitive because dept functionally determines
dept_loc
Eliminate the transitive dependence by breaking into
2 relations using projection to obtain 3rd normal
form
name
smith
jones
king
turner
olson
dept
402
401
402
400
401
Copyright © 1971-2002 Thomas P. Sturm
and
dept
400
401
402
dept_loc
200
200
100
Normalization
39
Third vs. Boyce-Codd Normal Form
Example 5 - stock:
s#
10
10
10
20
20
30

sname
GE
GE
GE
TRW
TRW
Syl
p#
102
103
104
102
105
103
qty
1000
625
2000
500
1200
1300
technically in 3NF



qty is the only non-key attribute (like example 1)
candidate keys are (s#, p#) and (sname, p#)
didn't require components of an alternate key to be fully
functionally dependent on the primary key
Eliminate the multiple focus by breaking into 2
relations using projection to obtain Boyce-Codd
normal form
s#
10
20
30
sname
GE
TRW
Syl
and
s#
10
10
10
20
20
30
p#
102
103
104
102
105
103
qty
1000
625
2000
500
1200
1300
or [s#, sname] and [sname, p#, qty]
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
40
Fourth Normal Form
4NF and 5NF are relevant only when all
attributes in the relation are parts of the key

if in BCNF and have a non-key attribute,
also in 5NF
Example 7 - skills:
Suppose we wish to store employee job skills and
language skills. (An employee may have many of each.)
employee
skill
language
Jones
electrical
French
Jones
electrical
German
Jones
mechanical
French
Jones
mechanical
German
In general:
if
and
Jones
Jones
x
y
A
B
then
and
Jones
Jones
x
y
B
A
The relation is in BCNF - because it is all key ...
but there is redundancy
Copyright © 1971-2002 Thomas P. Sturm
Normalization
41
Converting to 4NF
Ask the following questions:


Could the relation have non-key attributes?
Could any combination be missing?
If both answers are NO, need to break up
relation to achieve 4NF
Example 7 - skills:
employee
skill
language
should be broken up into two relations:
employee
Jones
Jones
skill
electrical
mechanical
and
employee
Jones
Jones
language
French
German
if job skill and language are independent
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
42
Fifth Normal Form
PJ/NF or Projection-Join Normal Form
Deals with cases where information can be
reconstructed from smaller pieces of information which can
be maintained with less redundancy
Example 8 - dealerships:
1. Agents represent Companies
2. Companies make Products
3. Agents sell Products
Which Agent sells which Product for which Company?
Agent
smith
smith
jones

Company
ford
gm
ford
Product
car
truck
car
this form is necessary in the general case
BUT if we put a rule into effect that reads:
4. if an agent sells a product, and an agent represents a
company, then the agent must sell the product made
by the company
So, to obey the rule, we must add
smith
smith
ford
gm
truck
car
NOW, with the rule and the new rows, we have
REDUNDANCY
Copyright © 1971-2002 Thomas P. Sturm
Normalization
43
Converting to 5NF
This time, we must break the relation into three
parts (will not break in two)
Example 8 - dealerships:
Agent
smith
smith
jones
smith
smith
Company
ford
gm
ford
ford
gm
Product
car
truck
car
truck
car
BREAK INTO 3
Agent
smith
smith
jones
Company
ford
gm
ford
Agent
smith
smith
jones
Product
car
truck
car
Company
ford
ford
gm
gm
Product
car
truck
car
truck
A relation is already in 5NF if it's information
content cannot be reconstructed from several
smaller record types (having different keys)

Only have 5NF problems if there are symmetry
constraints (a pair of rows requires the existence of
one or more additional rows)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
44
Domain/Key Normal Form
No insertion/deletion anomalies
Impossible to make an insertion/deletion that
violates a constraint
Constraint types:


domain constraints
key constraints
Example 9 - customers
branch
west
south
east
south
cust#
1234
1325
1421
1511
where valid branches are west, east, south
Copyright © 1971-2002 Thomas P. Sturm
Normalization
45
Enforcing Domain Integrity in DK/NF
Example 9 - customers:
branch
west
south
east
south
north
cust#
1234
1325
1421
1511
1600
If this update is possible, not in DK/NF
One possibility for prohibiting this update is to maintain
a table of legal branches and write code to prohibit
the entry of a branch not in the table
legal branch
west
south
east
Problem: What's to stop someone from placing south in
the legal branch table?
Possible partial solution: Restrict access to the legal
branch table
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
46
Logical Data
Structures
Logical Database Design
Logical Data Structures (LDS)
Basic LDS Components
Example Relationships
Handling an M-M Relationship
Identifier Representation
Sample Database
LDS for Sample Database
LDS for Example 7 - Skills
Correct LDS for Independence
LDS for Example 8 - Dealerships
Dealerships with Constraints
Modelling Concepts
Map LDS to Well-Formed Relations
Logical Database Design



Constructive approach
Considers semantics
Documents





data dependencies
identifiers
entities
needed relations
“rules”
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
48
Logical Data Structures (LDS)
Graphical means of
 naming and
 depicting
the types of data in a database
Simple, yet precise
Useful to
 technically-oriented analysts
 application-oriented users
Easy to read
Supports the design task
 logical structure design is hard
 tool aids the design task
 notation does not get in the way
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
49
Basic LDS Components
Entity

any type of thing about which information is
maintained
EXAMPLE
entity_name
student
Attribute

a characteristic of exactly one entity (fully
functionally dependent on the entity)
attribute_name
EXAMPLE: Student attributes
student_name
student_id#
student
soc_sec#
Relationships

an association between a pair of entities (or “roles”),
one-to-one, one-to-many only
or
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
but
never
50
Example Relationships
1 - 1 Example: Monogamous marriage
man
woman
Can label relationship
man
wife of man/
woman
husband of woman
1-M Example: Students of a college
college

student
Need not label a relationship if it can be stated as:
college of student / students of college
or
student has college / college has students
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
51
Handling an M-M Relationship
M-M Example: Brother - Sister
man_name
woman_name
man
sisters of man/
brothers of woman
woman
Problem: how do you represent the presence of sibling
rivalry?
THIS WON'T WORK
man_name
woman_name
man
woman
rivalry
SOLUTION
man_name
woman_name
man
woman
brother-sister
rivalry
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
52
Identifier Representation
Identifier: a set of attributes or relationships that
uniquely identify an instance of an entity
(single field key)
(multiple-field key)
Example:
student_name
college_name
college
student_id#
student
college#
soc_sec#
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
53
Sample Database
Employee: (emp)
attributes: Ename, Job, Mgr, Hired, Rate, Bonus
Department: (dept)
attributes: DeptNo, Dname, Loc, Dbudget
Task: (task)
attributes: Tname, Hours
Project: (proj)
attributes: Project_id, Description, Pbudget, Due_date
Relationships



employees are members of a department
employees have a manager who is an employee
employees are assigned to tasks on projects
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
54
LDS for Sample Database
DeptNo
Dname
Loc
dept
Hired
Ename
Dbudget
Rate
Job
emp
Mgr
Bonus
Tname
task
Project_id
Hours
Description
proj
Pbudget
Due_date
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
55
LDS for Example 7 - Skills
Employees can have many skills, and a skill can be had
by many employees; an employee can know many
languages and a language can be known by many
employees.
skill
employee
emp
job_skill
emp/lang/job_skill
This diagram is correct
if all 3 are interdependent
language
lang
skill
employee
job_skill
emp
language
lang
This diagram is almost never correct
(It implies that a skill can be held by only one employee)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
56
Correct LDS for Independence
Assuming job skills and language skills are
independent, they represent two separate
many-to-many relationships
emp/job_skill
skill
employee
emp
job_skill
emp-lang
language
lang
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
57
LDS for Example 8 - Dealerships
In the general case, a contract involves one dealer, one
manufacturer, and one product. A dealer can have
many contracts, a manufacturer can have many
contracts, and a product can be mentioned in many
contracts.
company
agent
dealership
manufacturer
contract
This diagram is correct
in the general case
product
vehicle
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
58
Dealerships with Constraints
Dealers can deal with many manufacturers, and
manufacturers with many dealers. Dealers can sell
many vehicle types and vehicle types can be sold
by many dealers. Manufacturers can make many
vehicles and vehicles can be made by many
manufacturers. The combinations of who sells what
is determined by symmetry.
dealer-mfgr
company
agent
dealer
manufacturer
dealer-vehicle
mfgr-vehicle
vehicle
product
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
59
Modelling Concepts
Entities:
“it” must have
 identifier
 attributes
 relationships
“it” must be the focus of the system
need to develop for “it”:

name

description

membership criteria
must examine roles within subsets of “it”
Attributes:
must be non-transitively fully functionally dependent on
the entity it describes
must develop for each attribute:

name

description

domain definition
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
60
Modelling Concepts (Continued)
Identifiers:



determine which attributes are part of it
verify uniqueness
establish “not null” requirements
Relationships:
establish degree 1-1 or 1-M
 entity on 1 side must be functionally dependent on
entity on M side
 develop:
- name
- definition
 incorporate constraints, rules
 note referential integrity
- (values of foreign key must exist in key field of
another relation)
- (e.g. in the emp relation, if an employee is listed as
being in department 402, then in the dept relation
there must contain a row with a key value of 402)

Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
61
Map LDS to Well-Formed Relations
LDS
Relational Model
entity
attribute descriptor
single-valued relationship
descriptor
multi-valued relationship
descriptor
1-1 relationship
relation name
attribute
attribute (foreign key)
1-M relationship
nothing
either or both relationship
descriptors are attributes
relationship descriptor with
degree 1 (on the M side) is
an attribute
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
62
LDS  Relations Examples
Example: College students
student_name
college_name
college
student_id#
student
college#
soc_sec#
college
(college#, college_name)
student
F.K.
(student#, college#, student_name, soc_sec#)
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
63
Sample Database Relations
(See page 55 for the Sample Database LDS)
dept
(DeptNo, Dname, Loc, Dbudget)
emp
F.K. in emp
F.K.
(Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo)
proj
(Project_id, Description, Pbudget, Due_date)
task
F.K.
F.K.
(Tname, Ename, Project_id, Hours
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
64
Relations for Example 7 - Skills
emp/job_skill
skill
employee
emp
job_skill
emp-lang
language
lang
emp
(employee)
job_skill
(skill)
lang
(language)
emp/job_skill
F.K.
F.K.
(employee, skill)
emp-lang
F.K.
F.K.
(employee, language)
Copyright © 1971-2002 Thomas P. Sturm
Logical Data Structures
65
Relations for Example 8 - Dealerships
(See page 59 for Dealership LDS with symmetry
restrictions)
dealer
(agent)
manufacturer
(company)
vehicle
(product)
dealer-mfgr
F.K.
F.K.
(agent, company)
dealer-vehicle
F.K.
F.K.
(agent, product)
mfgr-vehicle
F.K.
F.K.
(company, product)
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
66
References
Carlos, John V. “Logical Data Structures.” Technical
Report 85-23. Computer Sciences Department,
Institute of Technology, University of Minnesota.
1986.
Codd, E. F. “Is Your DBMS Really Relational.”
Computerworld. October 14, 1985.
Codd, E. F. “Relational Database: A Practical Foundation
for Productivity.” Communications of the ACM.
Vol. 25, Number 2 (February, 1982).
Conte, Paul. “Understanding Relational Data Bases.”
Computer Language. Vol. 4, Number 5 (May,
1987).
Date, C. .J. and Darwen, Hugh A Guide to the SQL
Standard. Third Edition Addison-Wesley. 1993.
Date, C. J. An Introduction to Database Systems. Volume
I, Sixth Edition. Addison-Wesley. 1995.
Date, C. J. An Introduction to Database Systems. Volume
II. Addison-Wesley. 1984.
Date, C. J. Relational Database: Selected Writings.
Addison-Wesley. 1986.
Date, C. J. “Where SQL Falls Short.” Datamation. May
1, 1987.
Harrington, Jan. Relational Database Management for
Microcomputers. Holt, Rinehart, and Winston.
1987.
Kent, William. “A Simple Guide to Five Normal Forms in
Relational Database Theory.” Communications of
the ACM. Vol. 26, Number 2 (February, 1983).
Markel, John. “Is ANSI-Standard SQL An Application
Development Cure-all?” Hardcopy. May, 1987.
Martin, James. Fourth-Generation Languages. Volume I:
Principles. Prentice Hall. 1985.
Nolan, Richard L. “Managing the Computer Resource: A
Storage Hypothesis.” Communications of the ACM.
Vol. 16, Number 7 (July, 1973).
Rob, Peter and Coronel, Carlos. Database Systems:
Design, Implementation, and Management. Third
Edition. Boyd and Fraser. 1997.
Sturm, Thomas P. Data Structures, Direct Access, and
Database Management.
Copyright © 1971-2002 Thomas P. Sturm Relational Databases
68