Download Normalization - IMED MCA 10+

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Versant Object Database wikipedia , lookup

Data analysis wikipedia , lookup

Data model wikipedia , lookup

Forecasting wikipedia , lookup

Database wikipedia , lookup

Information privacy law wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Business intelligence wikipedia , lookup

Relational algebra wikipedia , lookup

Clusterpoint wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Functional Dependence
 In a relation including attribute A and B, B is functional dependent on A if, for
every valid occurrence, the value A determines the value B
 An occurrence can not be used to show that a dependency is true, only that
it is false
 A and B can be composite
 If B is ‘Functional Dependent on’ A, then A ‘is the determinant of B’
 All fields are functionally dependent on the primary key – or indeed any
candidate key – be definition.
[email protected]
Functional Dependencies (FD) - Definition
Let R be a relation scheme and X, Y be sets of attributes in R.
A functional dependency from X to Y exists if and only if:
 For every instance of |R| of R, if two tuples in |R| agree on the values of the
attributes in X, then they agree on the values of the attributes in Y
We write X  Y and say that X determines Y
Example on Student (sid, name, supervisor_id, specialization):
 {supervisor_id}  {specialization} means
 If two student records have the same supervisor (e.g., Dimitris), then
their specialization (e.g., Databases) must be the same
 On the other hand, if the supervisors of 2 students are different, we do
not care about their specializations (they may be the same or different).
Sometimes, I omit the brackets for simplicity:
 supervisor_id  specialization
[email protected]
Trivial FDs
A functional dependency X  Y is trivial if Y is a subset of X
 {name, supervisor_id}  {name}
 If two records have the same values on both the name and supervisor_id
attributes, then they obviously have the same name.
 Trivial dependencies hold for all relation instances
A functional dependency X  Y is non-trivial if YX = 
 {supervisor_id}  {specialization}
 Non-trivial FDs are given implicitly in the form of constraints when designing a
database.
 For instance, the specialization of a students must be the same as that of the
supervisor.
 They constrain the set of legal relation instances. For instance, if I try to insert
two students under the same supervisor with different specializations, the
insertion will be rejected by the DBMS
[email protected]
Functional Dependencies and Keys
A FD is a generalization of the notion of a key.
For Student (sid, name, supervisor_id, specialization),
we write:
{sid}  {name, supervisor_id, specialization}
 The sid determines all attributes (i.e., the entire record)
 If two tuples in the relation student have the same sid, then they must
have the same values on all attributes.
 In other words they must be the same tuple (since the relational model
does not allow duplicate records)
[email protected]
Superkeys and Candidate Keys
A set of attributes that determine the entire tuple is a superkey
 {sid, name} is a superkey for the student table.
 Also {sid, name, supervisor_id} etc.
A minimal set of attributes that determines the entire tuple is a candidate key
 {sid, name} is not a candidate key because I can remove the name.
 sid is a candidate key – so is HKID (provided that it is stored in the table).
If there are multiple candidate keys, the DB designer chooses designates one as the
primary key.
[email protected]
Reasoning about Functional Dependencies
 It is sometimes possible to infer new functional dependencies from a set of
given functional dependencies

independently from any particular instance of the relation scheme or of any additional
knowledge
Example:
From
{sid}  {first_name} and
{sid} {last_name}
We can infer
{sid}  {first_name, last_name}
[email protected]
Redundancy of FDs
Sets of functional dependencies may have redundant dependencies that
can be inferred from the others
 {A}{C} is redundant in: {{A}{B}, {B}{C},{A} {C}}
Parts of a functional dependency may be redundant
 Example of extraneous/redundant attribute on RHS:
{{A}{B}, {B}{C}, {A}{C,D}} can be simplified to
{{A}{B}, {B}{C}, {A}{D}}
(because {A}{C} is inferred from {A}  {B}, {B}{C})
 Example of extraneous/redundant attribute on LHS:
{{A}{B}, {B}{C}, {A,C}{D}} can be simplified to
{{A}{B}, {B}{C}, {A}{D}}
(because of {A}{C})
[email protected]
Example of Bad Design
Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customername, loan-number, amount) where:
{branch-name}{branch-city, assets}
Bad Design
 Wastes space. Data for branch-name, branch-city, assets are repeated for each loan that a
branch makes
 Complicates updating, introducing possibility of inconsistency of assets value
 Difficult to store information about a branch if no loans exist. Can use null values, but
they are difficult to handle.
[email protected]
Usefulness of FDs
Use functional dependencies to decide whether a particular relation R is in
“good” form.
In the case that a relation R is not in “good” form, decompose it into a set of
relations {R1, R2, ..., Rn} such that
 each relation is in good form
 the decomposition is a lossless-join decomposition
 if possible, preserve dependencies
In our example the problem occurs because there FDs ({branchname}{branch-city, assets}) where the LHS is not a key
Solution: decompose the relation schema Lending-schema into:
 Branch-schema = (branch-name, branch-city,assets)
 Loan-info-schema = (customer-name, loan-number, branch-name,
amount)
[email protected]
Normalization
Normalization is the process of restructuring the logical data
model of a database to eliminate redundancy, organize data
efficiently, reduce repeating data and to reduce the potential
for anomalies during data operations. Data normalization
also may improve data consistency and simplify future
extension of the logical data model. The formal classifications
used for describing a relational database's level of
normalization are called normal forms.
[email protected]
Why Normalization?
1.
A non-normalized database may store data representing a particular
referent in multiple locations. An update to such data in some but not all of
those locations results in an update anomaly, yielding inconsistent data. A
normalized database prevents such an anomaly by storing such data (i.e.
data other than primary keys) in only one location.
2.
A non-normalized database may have inappropriate dependencies, i.e.
relationships between data with no functional dependencies. Adding data
to such a database may require first adding the unrelated dependency. A
normalized database prevents such insertion anomalies by ensuring that
database relations mirror functional dependencies.
3.
Similarly, such dependencies in non-normalized databases can hinder
deletion. That is, deleting data from such databases may require deleting
data from the inappropriate dependency. A normalized database prevents
such deletion anomalies by ensuring that all records are uniquely
identifiable and contain no extraneous information.
[email protected]
1NF
Eliminate Repeating Groups - Make a separate table for each set of
related attributes, and give each table a primary key.
2NF
Eliminate Redundant Data - If an attribute depends on only part of a
multi-valued key, remove it to a separate table.
3NF
Eliminate Columns Not Dependent On Key - If attributes do not
contribute to a description of the key, remove them to a separate table.
BCNF
Boyce-Codd Normal Form - If there are non-trivial dependencies
between candidate key attributes, separate them out into distinct
tables.
4NF
Isolate Independent Multiple Relationships - No table may contain two
or more 1:n or n:m relationships that are not directly related.
5NF
Isolate Semantically Related Multiple Relationships - There may be
practical constrains on information that justify separating logically
related many-to-many relationships.
[email protected]