Download 2.1 Data Models - KV Institute of Management and Information Studies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

IMDb wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Base Management System
Unit 2
Modelling and Design Frame Work
Topics to be Covered – Data Models – Conceptual Design – ER diagram – relationships
– Normalization – data management and system integration.
Table of Contents
Unit 2............................................................................................................................................................. 1
Modelling and Design Frame Work .............................................................................................................. 1
2.1 Data Models ........................................................................................................................................ 2
2.1.1 Types of Data Models .................................................................................................................. 2
2.2 Conceptual Design .............................................................................................................................. 8
2.3 ER Diagram and Relationships .......................................................................................................... 12
2.4 Normalization.................................................................................................................................... 21
2.5 Data Management and System Integration ...................................................................................... 28
2.6 Reference .......................................................................................................................................... 29
1
2.1 Data Models
2.1.1 Types of Data Models
(i) Hierarchical Models









The hierarchical data model organizes data in a tree structure.
There is a hierarchy of parent and child data segments.
This structure implies that a record can have repeating information, generally in the child
data segments.
Data in a series of records, which have a set of field values attached to it.
It collects all the instances of a specific record together as a record type.
These record types are the equivalent of tables in the relational model, and with the
individual records being the equivalent of rows.
To create links between these record types, the hierarchical model uses Parent Child
Relationships.
These are a 1: N mapping between record types.
This is done by using trees, like set theory used in the relational model, "borrowed" from
maths.
2




For example,
o An organization might store information about an employee, such as name,
employee number, department, salary.
o The organization might also store information about an employee's children, such
as name and date of birth.
o The employee and children data forms a hierarchy, where the employee data
represents the parent segment and the children data represents the child segment.
o If an employee has three children, then there would be three child segments
associated with one employee segment.
In a hierarchical database the parent-child relationship is one to many.
This restricts a child segment to having only one parent segment.
Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's
Information Management System (IMS) DBMS, through the 1970s.
(ii) Network Models










The data were more naturally modeled with more than one parent per child.
So, the network model permitted the modeling of many-to-many relationships in data.
In 1971, the Conference on Data Systems Languages (CODASYL) formally defined
the network model.
The basic data modeling construct in the network model is the set construct.
A set consists of an owner record type, a set name, and a member record type.
A member record type can have that role in more than one set, hence the multi parent
concept is supported.
An owner record type can also be a member or owner in another set.
The data model is a simple network, and link and intersection record types may exist,
as well as sets between them.
Thus, the complete network of relationships is represented by several pairwise sets; in
each set some (one) record type is owner and one or more record types are members.
Usually, a set defines a 1: M relationship, although 1:1 is permitted. The CODASYL
network model is based on mathematical set theory.
3
(iii)Relational Data Models



RDBMS - relational database management system- A database based on the relational
model developed by E.F. Codd.
 A relational database allows the definition of data structures, storage and retrieval
operations and integrity constraints.
 The database have the data and relations between them are organized in tables.
 A table is a collection of records and each record in a table contains the same fields.
 Properties of Relational Tables:
o Values Are Atomic
o Each Row is Unique
o Column Values Are of the Same Kind
o The Sequence of Columns is Insignificant
o The Sequence of Rows is Insignificant
o Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values
of that field will use indexing to speed them up.
Where fields in two different tables take values from the same set, a join operation can be
performed to select related records in the two tables by matching values in those fields.
4
Object ID (Implicit, hidden from
the entire world)
Primary Key (Explicit, visible to
entire world)
Student Class
2 Attributes
Student Name
Marks
Map to approximate table
Outline the corresponding table model
for this class
Write SQL code corresponding to
the table model

For example,

o An "orders" table might contain (customer-ID, product-code) pairs and a
"products" table might contain (product-code, price) pairs so to calculate a given
customer's bill you would sum the prices of all products ordered by that customer
by joining on the product-code fields of the two tables.
This can be extended to joining multiple tables on multiple fields, because these
relationships are only specified at retrieval time, relational databases are classed as
dynamic database management system.
The RELATIONAL database model is based on the Relational Algebra.

5
Super Class
Attributes
Employee ID
Employee Name
Age
Grade
Nulls allowed?
N
N
Y
N
Employee Table
Sub Class
Attributes
Employee ID
Bonus
Number of Subordinates
Nulls allowed?
N
Y
N
Manager Table
Sub Class
Clerk Table
Attributes
Employee ID
Number of Pending tasks
Nulls allowed?
N
Y
(iv) Object – Oriented Model





Object DBMSs add database functionality to object programming languages.
They bring much more than persistent storage of programming language objects.
Object DBMSs extend the semantics of the C++, Smalltalk and Java object
programming languages to provide full-featured database programming capability,
while retaining native language compatibility.
A major benefit of this approach is the unification of the application and database
development into a seamless data model and language environment.
As a result, applications require less code, use more natural data modeling, and code
bases are easier to maintain.
6
C++ Object
Java Object
OODBMS





Object developers can write complete database applications with a modest amount of
additional effort.
According to Rao (1994), "The object-oriented database (OODB) paradigm is the
combination of object-oriented programming language (OOPL) systems and
persistent systems. The power of the OODB comes from the seamless treatment of
both persistent data, as found in databases, and transient data, as found in executing
programs."
In contrast to a relational DBMS where a complex data structure must be flattened out
to fit into tables or joined together from those tables to form the in-memory structure,
object DBMSs have no performance overhead to store or retrieve a web or hierarchy
of interrelated objects.
This one-to-one mapping of object programming language objects to database objects
has two benefits over other storage approaches: it provides higher performance
management of objects, and it enables better management of the complex
interrelationships between objects.
This makes object DBMSs better suited to support applications such as financial
portfolio risk analysis systems, telecommunications service applications, World Wide
Web document structures, design and manufacturing systems, and hospital patient
record systems, which have complex relationships between data.
(v) Entity – Attribute Value Model (EVA)


The best way to understand the rationale of EAV design is to understand row
modeling of which EAV is a generalized form.
Consider a supermarket database that must manage thousands of products and brands,
many of which have a transitory existence.
7






Here, it is intuitively obvious that product names should not be hard-coded as names
of columns in tables.
Instead, one stores product descriptions in a Products table: purchases/sales of
individual items are recorded in other tables as separate rows with a product ID
referencing this table.
Conceptually an EAV design involves a single table with three columns, an entity, an
attribute and a value for the attribute.
In EAV design, one row stores a single fact.
In a conventional table that has one column per attribute, by contrast, one row stores a
set of facts.
EAV design is appropriate when the number of parameters that potentially apply to
an entity is vastly more than those that actually apply to an individual entity.
2.2 Conceptual Design


The DBMS architecture describes how data in the database is viewed by the users.
It is not concerned with how the data is handled and processed by the DBMS.
8





The database users are provided with an abstract view of the data by hiding certain details
of how data is physically stored.
This enables the users to manipulate the data without worrying about where it is located
or how it is actually stored.
In this architecture, the overall database description can be defined at three levels,
namely, internal, conceptual, and external levels and thus, named three-level DBMS
architecture.
This architecture is proposed by ANSI/SPARC (American National Standards
Institute/Standards Planning and Requirements Committee) and hence, is also known as
ANSI/SPARC architecture.
The three levels are discussed here.



Internal level:
o It is the lowest level of data abstraction that deals with the physical
representation of the database on the computer and thus, is also known as
physical level.
o It describes how the data is physically stored and organized on the storage
medium.
o At this level, various aspects are considered to achieve optimal runtime
performance and storage space utilization.
o These aspects include storage space allocation techniques for data and
indexes, access paths such as indexes, data compression and encryption
techniques, and record placement.
Conceptual level:
o This level of abstraction deals with the logical structure of the entire
database and thus, is also known as logical level.
o It describes what data is stored in the database, the relationships among
the data and complete view of the user’s requirements without any
concern for the physical implementation.
o That is, it hides the complexity of physical storage structures.
o The conceptual view is the overall view of the database and it includes all
the information that is going to be represented in the database.
External level:
o It is the highest level of abstraction that deals with the user’s view of the
database and thus, is also known as view level.
o In general, most of the users and application programs do not require the
entire data stored in the database.
o The external level describes a part of the database for a particular group of
users.
o It permits users to access data in a way that is customized according to
their needs, so that the same data can be seen by different users in
different ways, at the same time.
o In this way, it provides a powerful and flexible security mechanism by
hiding the parts of the database from certain users, as the user is not aware
of existence of any attributes that are missing from the view.
9






These three levels are used to describe the schema of the database at various levels.
Thus, the three-level architecture is also known as three-schema architecture.
The internal level has an internal schema, which describes the physical storage structure
of the database.
The conceptual level has a conceptual schema, which describes the structure of entire
database.
The external level has external schemas or user views, which describe the part of the
database according to a particular user’s requirements, and hide the rest of the database
from that user.
The physical level is managed by the operating system under the direction of DBMS. The
three-schema architecture.
Three-schema architecture





To understand the three-schema architecture, consider the three levels of the BOOK file
in Online Book database. Two views (view 1 and view 2) of the BOOK file have been
defined at the external level.
Different database users can see these views. The details of the data types are hidden
from the users.
At the conceptual level, the BOOK records are described by a type definition.
The application programmers and the DBA generally work at this level of abstraction.
At the internal level, the BOOK records are described as a block of consecutive storage
locations such as words or bytes.
10

The database users and the application programmers are not aware of these details;
however, the DBA may be aware of certain details of the physical organization of the
data.
Three levels of Online Book database (BOOK file)







In three-schema architecture, each user group refers only to its own external view.
Whenever a user specifies a request to generate a new external view, the DBMS must
transform the request specified at external level into a request at conceptual level, and
then into a request at physical level.
If the user requests for data retrieval, the data extracted from the database must be
presented according to the need of the user.
This process of transforming the requests and results between various levels of DBMS
architecture is known as mapping.
The main advantage of three-schema architecture is that it provides data independence.
Data independence is the ability to change the schema at one level of the database system
without having to change the schema at the other levels.
Data independence is of two types, namely, logical data independence and physical data
independence.
o Logical data independence:
11




It is the ability to change the conceptual schema without affecting the
external schemas or application programs.
The conceptual schema may be changed due to change in constraints or
addition of new data item or removal of existing data item, etc., from the
database.
The separation of the external level from the conceptual level enables the
users to make changes at the conceptual level without affecting the
external level or the application programs.
For example, if a new data item, say Edition is added to the BOOK file,
the two views are not affected.
o Physical data independence:



It is the ability to change the internal schema without affecting the conceptual
or external schema.
An internal schema may be changed due to several reasons such as for
creating additional access structure, changing the storage structure, etc.
The separation of internal schema from the conceptual schema facilitates
physical data independence.
2.3 ER Diagram and Relationships



E –R model is a high level conceptual data model development by chen in 1976 to
facilitate database design.
Conceptual database model is a set of concepts that describe the structure of a
database and associated retrieval and update transactions on the database.
A basic concept of the ER model includes entity types, relationship types and
attributes.
Components of E R Model:
Entities
(a)
Attributes
Relationship
Entities
 Functional item in any data model
 Identify each entity type by a name and a list of properties
 Database – many entity types Ex- Book, a Publisher or a person – entity
 It is a atomic entity – it can’t be broken into small pieces.
12

(b)
A book can have qualities that describes it like – ISBN, Title, Author, and
Publisher.
Attributes
 An entity composed of additional information, which describes the entity.
 Components of an entity or the qualifiers that describes called – attributes
 Ex – ISBN, Title, Author, Publisher, Price, Year of publication.
 Those information are additional information for a book entity
 The entity is shown in upper class and attributes shown in lower class
 Ex – BOOK (Entity) - ISBN, title, publisher (attributes)
 Attributes can have the same name in different entities but same entity
can’t be duplicated.
 Each attributes is associated with a set of values called domain.
 Ex- a student age between 14 and 17
 Even the domains may go for sub domain
 Ex – DOB - Date, Month, Year.
Attributes are classified into 5 types
Simple
(1)
Composite
Single Valued
Multivalued
Derived
Simple
Single component with an independent existence
Ex - Gender, Age, Salary
(2)
Composite
Multiple components with an independent existence
Ex – Address – Street name, Area, City, Pin code …
(3)
Single- Valued
Single value for a single entity
13
Ex- class room entity have single value for the room number attribute and room
number attribute to as being single – valued.
(4)
Multi- valued Attribute
One that holds multiple values for a single entity
Ex – student entity can have – hobby – reading, music, movies……
(5)
Derived Attributes
One that represent a value that is derived from the value of a related attributes
Ex- Age attributes derived from DOB.
(c)
Relationship
Entity – attribute definitions only capture that static meaning of the real world
items.
Ex – Book is published by a particular publisher, an employee may work for a
manager, person has a child and child has a cousin …….
ERD for the Student entity
First Name
Last Name
Student
Class
Date of Birth
Age
Hobbies
Address
Roll number
Street
State
City
14
Pin Code
Some terms related to entities and relationship –
(a)
(b)
(c)
(d)
(e)
Degree
Connectivity
Cordiality
Dependency
Participation
(a)
Degree
Degree is the number of associated entities
Student
Requires
Ex – Unary (Single entity)
Teacher
Subject
Teaches
Student
15
Ternary Relationship
(b)

Connectivity
Relationship can be classified as one to one, one to many and many to many
1
Manager
1
Department
Manages
1
Department
N
Has
N
N
Employee

Courses
Joins
(c)
Employees
Cardinality
Specific number of entity occurrence associated with one
occurrence of the related entity
1
Department
N
Employee
Has
(1,1)
(0,100)
N
Employee
N
Joins
(0,2)
Course
(0,10)
16

Company policy does not allow 100 employee at one department at
same time they can’t opt more than 2 course which the company is offering 10 course.
(d)
Dependency

Entities are classified as being strong or weak entity type

Entity type that is existence – dependent on some other entity is
called as weak entity and which entity does not dependent on the existence entity –
strong entity

Ex – Weak entity – Child, dependent or subordinate
Strong entity – parent, owner, dominant …….
Strong
Company
Strong
Employs
Employee
Weak
Patent
Has
(e) Participation
There are two ways an entity can participate in a relationship – totally or partially
called mandatory or optional.
17
N
Employee
N
Course
Joins
(0,2)




(0,10)
Employees can take up to 2 courses
“Employee joins course”, “Employees not to join a course”
Employee – Mandatory
Course - Optional
Ex – one to one relationship
Name
Age
Gender
Emp_no
Employer
Salary
ISA
Emp_no
Consultant
Designation
Client
18
Ex – One to Many
Name
Age
Gender
Emp_no
Employer
Salary
Dept_id
Department
Name
Many to Many
Name
Age
Gender
Address
Author
Phone no
Name
Publisher
Editor
19
Address
E – R Diagram










Book club has members
Book club sells books to its members
Members places order for books
Each order contains one or more than on books.
Books written by authors
Publisher publishes the book
Author can write more than one book and a book can have more than one author.
Book is published by a publisher, but a publisher publishes many books.
A member can place more than on order
He can also choose not to place an order
N
1
Member
Enrolls
in
Book club
(1,N)
(1,1)
1
(1,N)
1
Places
(1,N)
Fulfills
Sells
N
Order
(1,N)
N
(1,1)
1
Publisher
Author
(1,N)
(1,N)
(1,1)
N
(1,N)
1
(1,1)
Book
Writes
(1,N)
Publishe
s
N
20
E – R diagram – Book Club
2.4 Normalization

Process of building database
structure to store data

Process of normalization was
first development by E. F. Codd

Normalization is a formal
process of developing data structure in a manner that eliminates redundancy and
promotes integrity.

Stages of Normal Forms
(i)
(ii)
(iii)
(iv)
(BCNF)
(v)
(vi)
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce – Codd Normal Form
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

Keys –
Primary key – column of a table whose purpose is to uniquely identify records
from the same table
Foreign key – column in a table that uniquely identifies the records from a
different table

Relationship –
(a)
One to one – rarely used
(b)
One to many – commonly used
(c)
Many to many – problematic
(a)
First Normal Form (INF)
Relation in which the intersection of each row and column contains one and only one
value
Repeating groups are eliminated or removed from the table.
Ex – Create table contents.
(Contact_id
Integer Not null
L_Name
Varchar (20) Not null
F_Name
Varchar (20)
Contact_Date1
Date
Contact_Desc1
Varchar (50)
Contact_Date2
Date
21
Contact_Desc2
P
Varchar (50));
F

P
Above data structure has
repeating group of date and description of 2 conversations. To eliminate the repeating
group, the group moved to another table.

P- Primary key, F-Foreign key
(b)
Second Normal Form (2NF)

No non key is attributing to have
functionally dependent on the primary key.

(Emp_id
L_Name
F_Name
Dept_Code
Description
Ex – Create table Employee.
Integer Not null,
Varchar (20) Not null,
Varchar (20),
integer,
Varchar (50));
22
Ex – Create table Employee.
(Emp_id
L_Name
F_Name
Dept_Code
Integer Not null,
Varchar (20) Not null,
Varchar (20),
integer
Create table Department
Dept_Code
Description
P- Primary Key, F – Foreign Key
integer,
Varchar (50));
P
F
(c)
P
Third Normal Form (3NF)

A transitive dependency in a
relation with functional dependency between two or more non key attributes.
Ex – Create table Contacts.
(Emp_id
Integer Not null,
L_Name
Varchar (20) Not null,
F_Name
Varchar (20),
Company_Name
integer,
Company_Location
Varchar (50));

Contact id – primary key so all
the remaining attributes are functionally dependent on this attribute.

There is a transitive dependency
Company_Location is dependent on Company_Name and Company_Name
23
dependent on Contact_id, until the location of the company differ on an
individual basic, the column is not dependent on key value.
o
Anomaly
(insertion,
deletion,
modification)
P
F
P
Boyce – Codd Normal Form
(d)
(BCNF)

Database relations are designed
so that they have neither partial dependencies nor transitive dependencies.
Ex – ‘A’, ‘B’
B is dependent on A
A
B
Contact_id
L_Name
Contact_id
F_Name
Contact_id
Company_Id
 Transitive dependency
– A,B,C
– A -------> B, B ---------> C
24
–
–
–
C is transitively dependencies on A via B
Ex – Employee – Emp_id, Name, Address, Position,.
Department – Dept_id, Name, Manager
Emp_id ---------> Dept_id and Dept_id ---------> Dept_Name
Emp_id ---------> Dept_Name via Dept_id
In 3NF it allows the relations A----->B if there is no candidate key but BCNF it
needs a candidate key and a primary key.
Candidate_id
Int_date
Int_time
Intvr_id
Room_no
C001
24 may
10.30
E001
1
C002
24 may
11.30
E001
1
C003
24 may
10.30
E002
2
C004
26 may
11.30
E003
2
–
(Candidate_id, Int_date) --------> (Int_time, Intvr_id, Room_no) – primary key.
(Intvr_id, Int_date) --------> Room_no – cause problem for the relation.
Interview Table
Candidate_id
Int_date
Int_time
Intvr_id
C001
24 may
10.30
E001
C002
24 may
11.30
E001
C003
24 may
10.30
E002
C004
26 may
11.30
E003
Room Table
Candidate_id
Int_date
Int_time
Intvr_id
C001
24 may
10.30
E001
C002
24 may
11.30
E001
C003
24 may
10.30
E002
C004
26 may
11.30
E003
(e)
Fourth Normal Form (4NF)
25
 “No table should contain two or more one to many or many to many relationship that are
not directly related to the key” - Multi Valued dependencies
 “The employee can work in more than one project and can have more than one hobby.
The employee project and hobbies are independent of one another. To keep relation
consistent we must have a separate tuple to represent every combination of an
employee’s project and an employee’s hobbies
Employee Table
Name
Project
Hobbies
Alexis
Microsoft
Reading
Alexis
Oracle
Music
Alexis
Microsoft
Music
Alexis
Oracle
Reading
Mathews
Intel
Movies
Mathews
Sybase
Riding
Mathews
Intel
Riding
Mathews
Sybase
Movies
Project Table
Name
Project
Alexis
Microsoft
Alexis
Oracle
Mathews
Intel
Mathews
Sybase
26
Hobby Table
Name
Hobbies
Alexis
Reading
Alexis
Music
Mathews
Movies
Mathews
Riding
Ex
-P
(f)
F
F
Fifth Normal Form (5NF)

It is rarely used because “it
requires semantically related multiple relationship”.

Ex
–
create
table
lab_product_company
(Lab_Id
integer Not null
Product_id
integer Not null
Company_Id
integer Not null
Split into 2 tables
Lab_Product
Lab _Company
27
Advantage of Database

Reduces
data
redundancy
–
single centralized database.






Consistency
of
data
–
redundancy will minimize presence of same data in different files.
Flexibility of data – database is
designed based on bottom up approach, the end user have all reports
Enhanced data sharing –
integrated centralized database, same file can be used in different application.
Better enforcement of standards
– database designed at a better enforcement of standards (field names, width,
types…)
Reduced program maintenance –
database administrator.
Increased
programmer
productivity – measure of time taken to develop an application
2.5 Data Management and System Integration





Integrated data management lets you grow your business without growing your
infrastructure.
Ex –
o Let’s say you’ve just completed an acquisition, and you need to bring three new
manufacturing facilities on board.
o Application consolidation and retirement is common in this scenario.
o Data archiving capabilities help to minimize cost and accelerate completion of
such scenarios:
 You can minimize the data that you migrate from the legacy system to the
consolidated system accomplishing the consolidation faster and
minimizing hardware and software requirements to support additional
load.
 You retain archived information for as long as needed on lower cost stir,
immutable if needed, while providing flexible access to it for e-discovery.
Time to market is a critical imperative in today’s environment.
IDM (Integrate Data Management) helps you produce enterprise-ready applications
faster.
For example
o The pure Query technology is built to give developers the productivity they need,
while helping them adopt best practices for data access.
28






o It provides a collaborative environment for developers and DBAs to work
together to optimize data access performance, maximize database security, and
improve manageability.
o It makes service enabling vetted database assets as simple as a drag-and-drop
gesture.
o The test data management capabilities let tester leverage production-like data
while safeguarding client privacy and corporate confidentiality.
o Then we help to jump-start development and testing efforts based on accelerators
for packaged applications, industry models, and compliance initiatives.
For production systems, the challenge is to meet increasingly strict and challenging
service level targets yet still free up staff time for value creation activities.
We’re focused on learning more about the environment enabling the tools to automate
and simplify operations.
The aggregating and contextualizing information across the solution stack so that
administrators have the information they need to identify emergent problems, view
relevant information about the problem, isolate the problem to its source, and get expert
advice on resolution.
We want to help business and IT work together towards business objectives such that
there is a common understanding and transparent and consistent execution across the
business.
We are building out a capability to define business policies and semantics early in the
design cycle and then share them via models and other data artifacts across the lifecycle.
We call this idea model-driven governance. Which is Not only will this improve
alignment and governance, but also organizational productivity and effectiveness by
facilitating seamless collaborate: From analysts to architects to developers to
administrators from design to delivery to management.
2.6 Reference

Database Management System – Alexis Leon and Mathews Leon
29