* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Relational databases: normal forms
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational algebra wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational databases: normal forms Practical Course Integrative Bioinformatics March 6, 2012 1 The relational database model has been proposed by E. F. Codd in 1970. In this model the data organization is first disconnected from the physical storage methods. The relational database model has been standard ever since. Relational databases (RDB) store datasets as rows in tables. The formal term ’relation’ is equivalent to the common name ’table’. There is a number of important terms for relational databases listed in Table 1. The concept of storing data Table 1: Relational database terms. in tables is natural, as it is quite similar to how one would do it on a sheet of paper. In a database, the way the data is stored is very important for its speed and consistency. Thus, constraints for relational databases, called normal forms, have been developed to ensure speed and consistency. Normal forms should not be seen as rules that must not be broken. They are more like a guideline for development of fast and consistent databases. Five normal forms are known but only three of them are commonly used and explained here. First normal form (1NF): A data model is in 1NF when the domain of each attribute is atomic and there are no repeating attributes. The term ’atomic’ means there are no rows that contain enumerations or composed data of any kind. Enumerations of alike attributes may not be implemented as several rows in the same table. In order to normalize a database, each table should have a primary key. In Fig. 1 the transformation of a table to 1NF is Figure 1: Transformation of a table in 1NF. shown. The upper table contains a non-atomic attribute (’Representative’) and an enumeration of clients and work time. ’Representative’ has to be decomposed to first and last name to reach 1NF. The repeating columns for time and client 2 are removed and a new row for each client is added. This corrupts the primary key as there are several rows that have the same primary key value. Thus, a client ID is added and the new primary key is composed of client and representative ID. The bottom table is in 1NF. It is easy to see that there is a lot of redundancy in it now. Note: There are relational databases that are not conform with 1NF, as they allow non-atomic attributes (e.g. sets of integers). These databases are called nested databases or non-first-normal-form databases (NFNF or NF2 ). See [1] for further reading about nested databases. Second normal form (2NF): A data model is in 2NF if it is in 1NF and there are no attributes that are dependent on only a part of the primary key. This normal form applies only to tables with primary keys consisting of multiple fields. Tables that have a single field primary key and are in 1NF are in 2NF as well. In the lower table of Fig. 1 ’First’, ’Last’ and ’Client’ are dependent on parts of the primary key only. They have to be moved to separate tables. Fig. 2 shows the transformation of the bottom table in Fig. 1 to 2NF. Now there Figure 2: The bottom table of Fig. 1 in 2NF. is a table for employee information, a table for client information and one table that links the two tables. The working time is dependent on both client and representative ID, as two representatives could work with the same client. Third normal form (3NF): A data model is in 3NF if it is in 2NF and there are no transitive attribute dependencies. Transitive attribute dependency means that A is dependent on C via B (i.e. A is dependent on B and B is dependent on C). If the company address is added to the client table of Fig. 2, it would not conform to 3NF anymore. The city name depends on the zip code and the zip 3 code depends on the company, hence there is a transitive dependency of the city on the company. Fig. 3 shows the client table with address information and a way to model it in 3NF. One could have used the zip code as primary key in the city table as it identifies Figure 3: Transformation from 2NF to 3NF. the city, but the primary key of a table should have no direct meaning [2]. Two cites from different countries might have the same zip code, for example. When using an additional key for cities, this would cause no problem at all. A database in 3NF is very fast and there is nearly no redundancy in it. The drawback of database normalization is that the amount of work needed to visualize or update data increases a lot. The fragmented way the data is stored requires a large number of database operations to view or change one object. Further details on normal forms and their implications on database performance as well as the theoretical background of database models can be found in [3]. References [1] IBM U2. Nested relational databases. Industry Whitepaper, 2001. [2] Scott W. Ambler. Mapping objects to relational databases - O/R mapping in detail. AmbySoft Inc., Toronto, Canada, 2006. Available at http://www.agiledata.org/essays/mappingObjects.html. 4 [3] Gottfried Vossen. Datenmodelle, Datenbanksprachen und DatenbankManagment-Systeme. Addison-Wesley, 2nd edition, 1994. 5