Download Relational databases: normal forms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Relational algebra wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Relational databases: normal forms
Practical Course
Integrative Bioinformatics
March 6, 2012
1
The relational database model has been proposed by E. F. Codd in 1970. In
this model the data organization is first disconnected from the physical storage
methods. The relational database model has been standard ever since.
Relational databases (RDB) store datasets as rows in tables. The formal term
’relation’ is equivalent to the common name ’table’. There is a number of important terms for relational databases listed in Table 1. The concept of storing data
Table 1: Relational database terms.
in tables is natural, as it is quite similar to how one would do it on a sheet of
paper. In a database, the way the data is stored is very important for its speed
and consistency. Thus, constraints for relational databases, called normal forms,
have been developed to ensure speed and consistency. Normal forms should not
be seen as rules that must not be broken. They are more like a guideline for
development of fast and consistent databases. Five normal forms are known but
only three of them are commonly used and explained here.
First normal form (1NF): A data model is in 1NF when the domain of each
attribute is atomic and there are no repeating attributes.
The term ’atomic’ means there are no rows that contain enumerations or
composed data of any kind. Enumerations of alike attributes may not be implemented as several rows in the same table. In order to normalize a database, each
table should have a primary key. In Fig. 1 the transformation of a table to 1NF is
Figure 1: Transformation of a table in 1NF.
shown. The upper table contains a non-atomic attribute (’Representative’) and
an enumeration of clients and work time. ’Representative’ has to be decomposed
to first and last name to reach 1NF. The repeating columns for time and client
2
are removed and a new row for each client is added. This corrupts the primary
key as there are several rows that have the same primary key value. Thus, a client
ID is added and the new primary key is composed of client and representative
ID. The bottom table is in 1NF. It is easy to see that there is a lot of redundancy
in it now.
Note: There are relational databases that are not conform with 1NF, as they
allow non-atomic attributes (e.g. sets of integers). These databases are called
nested databases or non-first-normal-form databases (NFNF or NF2 ). See [1] for
further reading about nested databases.
Second normal form (2NF): A data model is in 2NF if it is in 1NF and there
are no attributes that are dependent on only a part of the primary key.
This normal form applies only to tables with primary keys consisting of multiple fields. Tables that have a single field primary key and are in 1NF are in
2NF as well. In the lower table of Fig. 1 ’First’, ’Last’ and ’Client’ are dependent
on parts of the primary key only. They have to be moved to separate tables.
Fig. 2 shows the transformation of the bottom table in Fig. 1 to 2NF. Now there
Figure 2: The bottom table of Fig. 1 in 2NF.
is a table for employee information, a table for client information and one table
that links the two tables. The working time is dependent on both client and
representative ID, as two representatives could work with the same client.
Third normal form (3NF): A data model is in 3NF if it is in 2NF and there
are no transitive attribute dependencies. Transitive attribute dependency
means that A is dependent on C via B (i.e. A is dependent on B and B is
dependent on C).
If the company address is added to the client table of Fig. 2, it would not
conform to 3NF anymore. The city name depends on the zip code and the zip
3
code depends on the company, hence there is a transitive dependency of the city
on the company. Fig. 3 shows the client table with address information and a
way to model it in 3NF.
One could have used the zip code as primary key in the city table as it identifies
Figure 3: Transformation from 2NF to 3NF.
the city, but the primary key of a table should have no direct meaning [2]. Two
cites from different countries might have the same zip code, for example. When
using an additional key for cities, this would cause no problem at all.
A database in 3NF is very fast and there is nearly no redundancy in it. The
drawback of database normalization is that the amount of work needed to visualize or update data increases a lot. The fragmented way the data is stored
requires a large number of database operations to view or change one object.
Further details on normal forms and their implications on database performance as well as the theoretical background of database models can be found in
[3].
References
[1] IBM U2. Nested relational databases. Industry Whitepaper, 2001.
[2] Scott W. Ambler. Mapping objects to relational databases - O/R mapping in detail. AmbySoft Inc., Toronto, Canada, 2006. Available at
http://www.agiledata.org/essays/mappingObjects.html.
4
[3] Gottfried Vossen.
Datenmodelle, Datenbanksprachen und DatenbankManagment-Systeme. Addison-Wesley, 2nd edition, 1994.
5