Download Transparency 2 : Relational Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational algebra wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Models
E-Commerce Engineer
Student Notes
for
Unit 1: E-Commerce Databases
Element 1: Database Models
16/1
Database Models
Table of Contents
Student Notes .......................................................................................................................................................... 1
Transparency 1 : Cover ....................................................................................................................................... 3
Transparency 2 : Relational Databases ............................................................................................................... 4
Transparency 3 : Entity Type / Table .................................................................................................................. 5
Transparency 4 : Primary Keys ........................................................................................................................... 6
Transparency 5 : Relationships ........................................................................................................................... 7
Transparency 6 : Relationship Types .................................................................................................................. 8
Transparency 7 : Entity Relationship Diagram ................................................................................................... 9
Transparency 8 : Redundancy Elimination ....................................................................................................... 10
Transparency 9 : 1NF to 2NF Example............................................................................................................. 11
Transparency 10 : 2NF to 3NF Example ........................................................................................................... 12
Transparency 11 : 4NF and 5NF ....................................................................................................................... 13
Transparency 12 : Exercise 1 Master Example 1 .............................................................................................. 14
Transparency 13 : Exercise 2 Master Example 2 .............................................................................................. 15
Transparency 14 : List of References ................................................................................................................ 16
16/2
Database Models
Transparency 1 : Cover
16/3
Database Models
Transparency 2 : Relational Databases
What is a Database?
A database management system, or DBMS, gives the user access to their data and helps them
transform the data into information. Such database management systems include dBase,
Paradox, IMS, Oracle, and e.g. MySQL. These systems allow users to create, update, and
extract information from their databases. Compared to a manual filing system, the biggest
advantages to a computerised database system are speed, accuracy, and accessibility.
A database is a structured collection of data. Data refer to entity types (e.g. customer,
product..) ad their characteristics (e.g. name of a customer). Relational databases store each
data as a separate field belonging to an attribute of an entity type. For example, a customer's
first name, date of birth, and postal code are each stored in separate fields within the entity
type customer. The name of a field usually reflects its contents. A postal code field might be
named POSTAL-CODE or PSTL_CD. Each DBMS has its own rules for naming the data
fields.
A set of fields (e.g. name, birth date, address, etc. of a customer XYZ) is stored as record
belonging to the same entity type of data (e.g. customer). A set of records is stored in a table
(previously called entity type).
16/4
Database Models
Transparency 3 : Entity Type / Table
A table consists of a number of records. The atribute names of each record in the table are the
same, although the field values may differ. Every customer record has a surname attribute.
The values in the surname attribute can be different for each customer.
Each attribute occupies one column and each record occupies one row. In each column of the
table, you put a specific category of information for the customers, such as their ID number
(cus_id), first name, etc. Each row in the table contains the information relating to a specific
customer, together as one record. Each record is a unique entry and is independent of any
other record in the table.
In a database analysis of the business requirements, the database design team usually defines
the necessary set of tables. Different tables are created for the various groups of information.
For instance, a customer table is created to keep customer data and a product tabale is created
to keep product data.
16/5
Database Models
Transparency 4 : Primary Keys
Primary Keys
Every table has a field or a combination of fields that uniquely identifies each record in the
table. This unique identifier is called the primary key, or simply the key. The primary key
provides the means to distinguish one record from all the others in a table. It allows the user
and the database system to identify, locate, and refer to one particular record in the table.
The database design team determines the best candidate field for the primary key. The
customer's first and last names together could be a primary key, that is until a new employee
with the same name is hired. Then the key would no longer be unique. Sometimes the design
team has to define a new ID number or code field, just so that a table has a primary key. For
the customer table, the primary key would likely be the customer ID number.
Once a table has been assigned a primary key, each DBMS usually won't allow more than one
record in the table with the same value for the primary key. No two customers can have the
same ID number.
16/6
Database Models
Transparency 5 : Relationships
Relationships
After two or more entities are identified and defined with attributes, the participants determine
if a relationship exists between the entities. A relationship is any association, linkage, or
connection between the entities of interest to the business; it is a two-directional, significant
association between two entities, or between an entity and itself. Each relationship has a
name, an optionality (optional or mandatory), and a degree (how many). A relationship is
described in real terms.
Rarely will there be a relationship between every entity and every other entity in an
application. If there are only two or three entities, then perhaps there will be relationships
between them all. In a larger application, there are not always relationships between one
entity and all of the others.
Assigning a name, an optionality, and a degree to a relationship helps confirm the validity of
that relationship. If you cannot give a relationship all these things, then perhaps there really is
no relationship at all. For example, there is a relationship between customer and product. Each
customer may be ordering one or more products.
16/7
Database Models
Transparency 6 : Relationship Types
Relationship Types
One to One (1:1): In this case no extra relationship table is needed, just an attribute which
stores the key of the other table. E.g. If an electronic newspaper can only be from one editor,
and also each editor only does one newspaper, then it is sufficient to store the editor key with
the electronic newspaper record.
One to Many (1:n) - or vice versa: In this case no extra relationship table is needed, just an
attribute which stores the key of the other table. If an electronic newspaper can only be from
one editor, but each editor can do many newspapers, then it is sufficient to store the editor key
with the electronic newspaper record.
Many to Many (n:m): This is typical sign of a complex relationship which needs an extra
relationship table (see example sales table above) to be designed. The erelationship table then
has a primary key consisting of the keys of the related tables. See pair (cus_id, prod_id) in the
above sales table.
16/8
Database Models
Transparency 7 : Entity Relationship Diagram
To visually record the entities and the relationships between them, an entity relationship
diagram, or ERD, is drawn. An ERD is a pictorial representation of the entities and the
relationships between them. It allows the participants in the meeting to easily see the
information structure of the application. Later, the project team uses the ERD to design the
database and tables. Knowing how to read an ERD is very important. If there are any mistakes
or relationships missing, the application will fail in that respect. Although somewhat cryptic,
learning to read an ERD comes quickly.
16/9
Database Models
Transparency 8 : Redundancy Elimination
The goal of relational database design is to generate a set of schemas that allow us to store
information without unnecessary redundancy, and to retrieve information easily (and
accurately).
First Normal Form (1NF): Start with all the items of data in any order in one big table. To
reach 2NF group the data into separate tables to remove any data that is repeated.
Second Normal Form (2NF): Check to see if all the data in each separate table belongs to, or
is uniquely identified by the keyfield of that table. To reach 3NF split the data again so that
data has its own sensible keyfield.
Third Normal Form (3NF): Check that all fields in the records of the table are really
uniquely identified by the keyfield AND are independent of one another.
Split the data yet again so that the fields of all records belong to their keyfield only. This will
likely mean moving some fields to a new table and creating a new keyfield for them.
16/10
Database Models
Transparency 9 : 1NF to 2NF Example
Second normal form is violated when a non-key attribute is dependent on a sub-set of the
current key. It is only relevant when the key is composite, i.e., consists of several fields (e.g.
part and warehouse in the above example).
The key here consists of the PART and WAREHOUSE fields together, but WAREHOUSEADDRESS is a fact about the WAREHOUSE alone. The basic problems with this design are:
The warehouse address is repeated in every record that refers to a part stored in that
warehouse.
If the address of the warehouse changes, every record referring to a part stored in that
warehouse must be updated.
Because of the redundancy, the data might become inconsistent, with different records
showing different addresses for the same warehouse.
If at some point in time there are no parts stored in the warehouse, there may be no record in
which to keep the warehouse's address.
To satisfy second normal form, the record shown above should be decomposed into (replaced
by) the two records above.
16/11
Database Models
Transparency 10 : 2NF to 3NF Example
Third normal form is violated when a non-key field is a fact about another non-key field. The
EMPLOYEE field is the key. If each department is located in one place, then the LOCATION
field is an attribute of the DEPARTMENT -- in addition to being an attribute of the
EMPLOYEE. The problems with this design are the same as those caused by violations of
second normal form:
The department's location is repeated in the record of every employee assigned to that
department.
If the location of the department changes, every such record must be updated.
Because of the redundancy, the data might become inconsistent, with different records
showing different locations for the same department.
If a department has no employees, there may be no record in which to keep the department's
location.
To satisfy third normal form, the record shown above should be decomposed into the two
records.
To summarize, a record is in second and third normal forms if every field is either part of the
key or provides a (single-valued) attribute of exactly the whole key and nothing else.
16/12
Database Models
Transparency 11 : 4NF and 5NF
Fourth and fifth normal forms deal with multi-valued facts. The multi-valued fact may
correspond to a many-to-many relationship, as with employees and skills, or to a many-to-one
relationship, as with the children of an employee (assuming only one parent is an employee).
By "many-to-many" we mean that an employee may have several skills, and a skill may
belong to several employees.
Under fourth normal form, a record type should not contain two or more independent multivalued facts about an entity. In addition, the record must satisfy third normal form.
The term "independent" will be discussed after considering an example.
Consider employees, skills, and languages, where an employee may have several skills and
several languages. We have here two many-to-many relationships, one between employees
and skills, and one between employees and languages. Under fourth normal form, these two
relationships should not be represented in a single record (see above).
Arecord type is in fifth normal form when its information content cannot be reconstructed
from several smaller record types, i.e., from record types each having fewer fields than the
original record. The case where all the smaller records have the same key is excluded. Fifth
normal form does not differ from fourth normal form unless there exists a symmetric
constraint, in the absence of such a constraint, a record type in fourth normal form is always
in fifth normal form.
16/13
Database Models
Transparency 12 : Exercise 1 Master Example 1
16/14
Database Models
Transparency 13 : Exercise 2 Master Example 2
16/15
Database Models
Transparency 14 : List of References
F. Codd, "A Relational Model of Data for Large Shared Data Banks", Comm. ACM 13 (6), June 1970, pp. 377387. The original paper introducing the relational data model.
F. Codd, "Normalized Data Base Structure: A Brief Tutorial", ACM SIGFIDET Workshop on Data Description,
Access, and Control, Nov. 11-12, 1971, San Diego, California, E.F. Codd and A.L. Dean (eds.). An early tutorial
on the relational model and normalization.
E.F. Codd, "Further Normalization of the Data Base Relational Model", R. Rustin (ed.), Data Base Systems
(Courant Computer Science Symposia 6), Prentice-Hall, 1972. Also IBM Research Report RJ909. The first
formal treatment of second and third normal forms.
C.J. Date, An Introduction to Database Systems (third edition), Addison-Wesley, 1981. An excellent
introduction to database systems, with emphasis on the relational.
R. Fagin, "Multivalued Dependencies and a New Normal Form for Relational Databases", ACM Transactions on
Database Systems 2 (3), Sept. 1977. Also IBM Research Report RJ1812. The introduction of fourth normal
form.
R. Fagin, "Normal Forms and Relational Database Operators", ACM SIGMOD International Conference on
Management of Data, May 31-June 1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979. The
introduction of fifth normal form.
W. Kent, "A Primer of Normal Forms", IBM Technical Report TR02.600, Dec. 1973. An early, formal tutorial
on first, second, and third normal forms.
T.-W. Ling, F.W. Tompa, and T. Kameda, "An Improved Third Normal Form for Relational Databases", ACM
Transactions on Database Systems, 6(2), June 1981, 329-346. One of the first treatments of inter-relational
dependencies.
16/16