Download Normalization - IMED MCA 10+

Functional Dependence  In a relation including attribute A and B, B is functional dependent on A if, for every valid occurrence, the value A determines the value B  An occurrence can not be used to show that a dependency is true, only that it is false  A and B can be composite  If B is ‘Functional Dependent on’ A, then A ‘is the determinant of B’  All fields are functionally dependent on the primary key – or indeed any candidate key – be definition. [email protected] Functional Dependencies (FD) - Definition Let R be a relation scheme and X, Y be sets of attributes in R. A functional dependency from X to Y exists if and only if:  For every instance of |R| of R, if two tuples in |R| agree on the values of the attributes in X, then they agree on the values of the attributes in Y We write X  Y and say that X determines Y Example on Student (sid, name, supervisor_id, specialization):  {supervisor_id}  {specialization} means  If two student records have the same supervisor (e.g., Dimitris), then their specialization (e.g., Databases) must be the same  On the other hand, if the supervisors of 2 students are different, we do not care about their specializations (they may be the same or different). Sometimes, I omit the brackets for simplicity:  supervisor_id  specialization [email protected] Trivial FDs A functional dependency X  Y is trivial if Y is a subset of X  {name, supervisor_id}  {name}  If two records have the same values on both the name and supervisor_id attributes, then they obviously have the same name.  Trivial dependencies hold for all relation instances A functional dependency X  Y is non-trivial if YX =   {supervisor_id}  {specialization}  Non-trivial FDs are given implicitly in the form of constraints when designing a database.  For instance, the specialization of a students must be the same as that of the supervisor.  They constrain the set of legal relation instances. For instance, if I try to insert two students under the same supervisor with different specializations, the insertion will be rejected by the DBMS [email protected] Functional Dependencies and Keys A FD is a generalization of the notion of a key. For Student (sid, name, supervisor_id, specialization), we write: {sid}  {name, supervisor_id, specialization}  The sid determines all attributes (i.e., the entire record)  If two tuples in the relation student have the same sid, then they must have the same values on all attributes.  In other words they must be the same tuple (since the relational model does not allow duplicate records) [email protected] Superkeys and Candidate Keys A set of attributes that determine the entire tuple is a superkey  {sid, name} is a superkey for the student table.  Also {sid, name, supervisor_id} etc. A minimal set of attributes that determines the entire tuple is a candidate key  {sid, name} is not a candidate key because I can remove the name.  sid is a candidate key – so is HKID (provided that it is stored in the table). If there are multiple candidate keys, the DB designer chooses designates one as the primary key. [email protected] Reasoning about Functional Dependencies  It is sometimes possible to infer new functional dependencies from a set of given functional dependencies  independently from any particular instance of the relation scheme or of any additional knowledge Example: From {sid}  {first_name} and {sid} {last_name} We can infer {sid}  {first_name, last_name} [email protected] Redundancy of FDs Sets of functional dependencies may have redundant dependencies that can be inferred from the others  {A}{C} is redundant in: {{A}{B}, {B}{C},{A} {C}} Parts of a functional dependency may be redundant  Example of extraneous/redundant attribute on RHS: {{A}{B}, {B}{C}, {A}{C,D}} can be simplified to {{A}{B}, {B}{C}, {A}{D}} (because {A}{C} is inferred from {A}  {B}, {B}{C})  Example of extraneous/redundant attribute on LHS: {{A}{B}, {B}{C}, {A,C}{D}} can be simplified to {{A}{B}, {B}{C}, {A}{D}} (because of {A}{C}) [email protected] Example of Bad Design Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customername, loan-number, amount) where: {branch-name}{branch-city, assets} Bad Design  Wastes space. Data for branch-name, branch-city, assets are repeated for each loan that a branch makes  Complicates updating, introducing possibility of inconsistency of assets value  Difficult to store information about a branch if no loans exist. Can use null values, but they are difficult to handle. [email protected] Usefulness of FDs Use functional dependencies to decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that  each relation is in good form  the decomposition is a lossless-join decomposition  if possible, preserve dependencies In our example the problem occurs because there FDs ({branchname}{branch-city, assets}) where the LHS is not a key Solution: decompose the relation schema Lending-schema into:  Branch-schema = (branch-name, branch-city,assets)  Loan-info-schema = (customer-name, loan-number, branch-name, amount) [email protected] Normalization Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently, reduce repeating data and to reduce the potential for anomalies during data operations. Data normalization also may improve data consistency and simplify future extension of the logical data model. The formal classifications used for describing a relational database's level of normalization are called normal forms. [email protected] Why Normalization? 1. A non-normalized database may store data representing a particular referent in multiple locations. An update to such data in some but not all of those locations results in an update anomaly, yielding inconsistent data. A normalized database prevents such an anomaly by storing such data (i.e. data other than primary keys) in only one location. 2. A non-normalized database may have inappropriate dependencies, i.e. relationships between data with no functional dependencies. Adding data to such a database may require first adding the unrelated dependency. A normalized database prevents such insertion anomalies by ensuring that database relations mirror functional dependencies. 3. Similarly, such dependencies in non-normalized databases can hinder deletion. That is, deleting data from such databases may require deleting data from the inappropriate dependency. A normalized database prevents such deletion anomalies by ensuring that all records are uniquely identifiable and contain no extraneous information. [email protected] 1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key. 2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key, remove it to a separate table. 3NF Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table. BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables. 4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or n:m relationships that are not directly related. 5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships. [email protected]

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Normalization - IMED MCA 10+