Download Academic Script

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Business intelligence wikipedia , lookup

Predictive analytics wikipedia , lookup

Clusterpoint wikipedia , lookup

Forecasting wikipedia , lookup

Data model wikipedia , lookup

Relational algebra wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
PRINCIPLES OF COMPUTER SCIENCE
MODULE 22
-
E - R MODELS
The objectives of this module are:
1. To understand the Entity Relationship model.
2. To learn the different symbols for representing an E-R model and E-R
Diagram.
3. To have an idea about the various features of the E-R model.
4. To have a knowledge about the constraints to be enforced on the database.
5. To have an introduction to data mining.
22.1.
Introduction:
During the past, large collections of data were processed manually, their
relationship were undiscoverable because they were scattered throughout large
collections. But today’s technologies make it easy to collect huge amount of data,
combine or compare different collections of data etc.
The Entity – Relationship data model (E-R Model) provides a means of
identifying entities to be represented in the database and a way to show the
relationships between entities.
The task of creating a database application is a complex procedure which
needs the design of the database schema, design of the programs that access and
update the data and design of security scheme.
The Entity-Relationship (E-R) data model was developed to facilitate database
design by allowing specification of a schema that represents the overall logical
structure of the database. It is one among the semantic data models in which the
semantic aspect of the model is represented. It is especially useful in mapping the
meanings and interactions of real world entity onto a conceptual schema.
22.2.
The Entity – Relationship (E-R) Model:
E-R model grew out of the practice of using commercially available DBMS to
model DBMS based on hierarchical and network models. Simply stating, E-R is a
generalization of these models. The three features used to describe the E-R models
are Entity Sets, Relationship Sets and Attributes.
22.2.1.
Entity Sets:
Entity is a real world thing or object which is distinguishable from every other
object. For example, each student in a school is an entity. The entity will be
associated with a set of properties and the values for some set of properties may be
used to uniquely identify an entity. A student may have an admn_no property whose
value uniquely identifies that student. An entity can be tangible such as a person or
a book or it may be abstract such as a loan, a holiday or a concept.
A set of entities of the same type that share the same properties are called an
entity set. An entity set student represents all students of a particular school. The
individual entities that constitute a set are said to be the extension of the entity set.
An entity is described using a set of attributes. These are the descriptive
properties and they express that the database stores similar information concerning
each entity in the entity set but the values for each attribute may be distinct. The
attributes possessed by a student entity may include student_id, student_name,
house_no, street, city etc.
The entities will have a value for each of its attributes. For example, a student
entity may have the value 53276 for student_id, the value Karan for student_name,
258 for house_no, Second for street and Mumbai for city. The student_id attribute is
used to identify the students uniquely, because there may be more than one student
with the same name, street and city.
A database hence includes a collection of entity sets, consisting of a number of
entries of the same type.
22.2.2.
Relationship Sets:
The relationship represents the association among various entities. A
relationship set is a set of relationships of the same type. If E1, E2, E3…….En are
entity sets, then a relationship set R is a subset of
{(e1 , e2 ,........en ) / e1  E1, e2  E2 ,.........en  En )} where (e1 , e2 ,........en ) is a relationship.
The association between entity sets is referred to as participation. ie., in the
above specification, the entity sets E1, E2, E3…….En participate in a relation set R.
An association between the named entities in the real world is represented in
an E-R schema using a relationship instance.
The function of an entity in a relationship is called that entity’s role. They are
useful when the meaning of a relationship needs further clarification. If the same
entity set participates in a relationship set more than once in different roles, such
relationship set is said to be recursive. Hence explicit role names are necessary to
specify how an entity participates in a relationship instance.
A relationship can have attributes called descriptive attributes. But a
relationship instance in a relationship set must be uniquely identifiable from its
participating entities without using the descriptive attributes. The number of entity
sets that participate in a relationship set is called the degree of the relationship set.
For example, a binary relationship set has degree 2, a ternary relationship set has
degree 3 etc.
22.2.3.
Attributes:
An attribute of an entity set is a function that maps from the entity set in to a
domain. The set of permitted values for an attribute is called the domain or the
value set of that attribute. The entity can be described by a set of (attribute, data
value) pairs and each entity set may have several attributes. The attribute values
describing an entity constitute a large portion of the data stored in the database. If
an entity has no value for a particular attribute, it takes the value null to indicate
not applicable or no value exists. Various types of attributes are used in the E-R
model such as simple, composite, single valued, multi valued and derived.
Simple:
If an attribute cannot be divided into subparts, that attribute is said to
be simple.
Composite:
If an attribute can be divided into subparts or other attributes, that
attribute is called composite attribute. These are useful for designing such
attributes which the user wish to refer to an entire attribute in some
occasions and to only a component of the attribute on other occasions.
Composite attributes help to group together related attributes making the
modeling clear. For example, the address of a student entity can be composite
consisting of street, city, state and postal code.
Single- valued:
The attributes for which there is only one value for a particular entity is
called single valued attribute. For example, the admn_no for a student entity
refers to only one admission number.
Multi-valued:
Sometimes an attribute will have a set of values for a specific entity.
This type of attribute is said to be multi valued. The phone number for a
student or employee is an example for multiple valued attribute became a
particular student or employee may have zero, one or more phone numbers.
Derived:
If the value for an attribute can be derived from the value of other
related attributes or entities, that is said to be a derived one. For example, the
attribute age can be calculated from the data of birth and the current date.
Hence the age is a derived attribute.
22.3.
Constraints:
Constraints within a database are rules which control values allowed in
columns and also enforce the integrity between columns and tables. Each E-R
schema may define some constraints to which the contents of a database must
confirm. Some of the commonly used constraints are keys, mapping cardinalities
and participation constraints.
22.3.1.
Keys:
Keys provide identification of attributes and relationships to distinguish
entities and relationships from each other. The attribute values must be designed so
as to uniquely identify the entity.
A super key is the set of one or more attributes that taken collectively allows
identifying uniquely an entity in the entity set. If K is a super key, then superset of k
is also a super key. But in some situations, no proper subset is a super key. Those
minimal super keys are called candidate keys.
Assume that a combination of student_name and student_street identifies the
members of the student entity set. Then { student_id } and { student_name,
student_street } both are candidate keys. Although the attributes student_id and
student_name together can distinguish student entities, their combination does not
form a candidate key, since the attribute student_id alone is a candidate key.
The primary key is a candidate key that is selected as the principal means of
identifying entities within an entity set. The keys – primary, candidate and super are
properties of the entity set rather than a single entity. Any two entities in the set are
not allowed to have the same value on the key attributes at the same time. Hence
the key represents a constraint in the real world.
Like entities, the various relationships of a relationship set must also be
identified or distinguished.
If the relationship set R has no attribute associated with it, then the set of
attributes
Primary_key (E1) U Primary_key (E2) U ……………. Primary_key (En)
describes an individual relationship in set R.
If the relationship set R has attributes a1, a2, ………am associated with it, then
the set of attributes
Primary_key (E1) U Primary_key(E2) U ……………. Primary_key (En) U { a1, a2,
………am } describes an individual relationship in set R.
In both of these cases, the set of attributes
Primary_key (E1) U Primary_key(E2) U ……………. Primary_key (En) forms a
super key for the relationship set.
The structure of the primary key for the relationship set depends on the
mapping cardinalities of the relationship set.
22.3.2.
Mapping Cardinalities:
The number of entities to which another entity can be associated through a
relationship set is called the mapping cardinalities or cardinality ratios. They are
highly useful while describing binary relationships even though relationship sets
involving more than two entity sets can also be described.
A binary relationship set R between entity sets A and B can be one among
one-to-one, one-to-many, many-to-one, and many-to-many.
One-to-One: Any entity in A is associated with at most one entity in B and an entity
in B is associated with at most one entity in A.
A
a1
a1
a2
a2
a3
a3
a4
a4
B
b1
b2
b3
b4
Figure: One-to-One
One-to-Many: An entity in A is associated with any number of entries in B. but an
entity in B can be associated with at most one entity in A.
Figure: One-to-Many
Many-to-One: An entity in A is associated with at most one entity in B. An entity in
B can be associated with any number of entities.
Figure: Many-to-One
Many-to-Many: An entity in A is associated with any number of entities in B and an
entity in B is associated with any number of entities in A.
Figure: Many-to-Many
The real world situation of the relationship set determines the mapping
cardinality for that relationship set in the designing process.
22.4.
Entity Relationship Diagram:
The logical structure of an E-R model is represented graphically using the
Entity-Relationship diagram or the E-R Diagram. The important components of an
E-R diagram are:

Rectangles to represent entity sets.

Ellipses to represent attributes.

Diamonds to represent relationship sets.

Links to connect attributes to entity sets and entity sets to relationship sets.

Double ellipses to represent multi valued attributes.

Dashed ellipses to represent derived attributes.

Double lines to represent total participation of an entity in a relationship set.

Double rectangles to represent weak entity sets.
These are summarized in the following table. The table provides the generally
used symbols of E-R notation called the Chen’s notation.
Table : Symbols used in E-R Diagram
Symbol
Meaning
Entities
Weak Entities
Attributes
Multi valued Attribute
Derived Attribute
Relationship set
Identifying relationship set for a weak entity set
Total participation of entity set in relationship
Primary Key
Discriminating attribute of weak entity sets
S
Many-to-Many relationship
ince
there
is no
unive
Many-to-One relationship
rsal
stand
ard
for E-
One-to-One relationship
R
diagra
m
notati
on, some may use the alternative notation. Here an entity set may be represented as
a box with name outside and the attributes listed one below the other within the box
like,
This means that E is an entity set with attributes A1, A2, A3 where A1 is the
primary key.
The mapping cardinalities can be represented in an alternative notation called
crow’s-foot notation as given in the following figure.
Many-to-Many
Many-to-One
One-to-One
Here the relationship sets are shown by lines between entity sets without diamonds
and only binary relationships can be modeled like this.
For example, consider the E-R Diagram given here.
Custpmer_name
Customer_city
Customer-id
Customer_street
Customer
Loan_no
borrower
amount
Loan
Figure: E-R diagram for customer loan relation
There are two entity sets customer and loan connected through a binary
relationship set borrower. Customer has the attributes customer_id,
customer_name, customer_street and customer_city. The loan is associated with the
attributes loan_no and amount.
If the relationship set has some attributes associated with it, these attributes
should be linked to that relationship set as given here.
Access_date
Custpmer_name
Customer_city
Customer-id
Customer_street
Customer
borrower
Loan_no
amount
Loan
Figure : Modified E-R diagram for customer loan relation
The customer_name can be a composite attribute consisting of first_name,
middle_name and last_name. Street can also be composite of street_number and
name. A more precise representation of the above given E-R diagram is as follows
which shows the composite, multi valued and derived attributes.
street_no
street_name
middle_name
first_name
last_name
street
city
custpmer_name
customer-id
address
pin_code
city
loan_no
borrower
Customer
balance
Loan
age
phone_no
date_of_birth
access_date
Figure: E-R diagram showing composite, multi valued and derived attributes
22.5.
Weak Entity Sets:
Some entities may not have sufficient attributes to form a primary key. Such
entity sets are termed as weak entity set. An entity set having a primary key is called
a strong entity set.
Consider the entity set payment which has three attributes payment_no, date
and amount. Payment numbers are sequential numbers starting from 1 generated
sequentially for each loan. Thus although each payment entity is distinct, payments
of different loans may share the same payment numbers. Thus this entity set does
not have a primary key and is a weak entity set.
To be meaningful, a weak entity set must be associated with another entity set
called the identifying or owner entity set. The weak entity set is said to be existence
dependent on the identifying entity set. The relationship between the weak entity set
and the identifying entity set is called the identifying relationship. The identifying
relationship is many-to-one from the weak entity set to the identifying entity set and
the participation of the weak entity set in the relationship is total.
The discriminator of a weak entity set is a set of attributes that allows the
distinction of the weak entity set. The discriminator for payment is the payment is
the payment_no because for each loan, a payment number uniquely identifies one
single payment for that loan. This is also called the partial key of the entity set.
The primary key of a weak entity set is formed by combining the primary key
of the identifying entity set with the discriminator of the weak entity set. So the
primary key of payment will be
{ loan_no, payment_no }, where loan _no is the
primary key of the identifying entity set loan and payment_no is the discriminator
for payment.
The given E-R diagram represents the relationship between loan and payment
date
payment_no
loan_no
amount
amount
loan
Loanpay
ment
payment
Figure: E-R Diagram with weak entity set
The double outlined box represents a weak entity set and a doubly outlined
diamond indicates the identifying relationship. Double lines are used to show the
total participation of the weak entity set payment in the relation loan-payment,
which means that every payment must be related through loan-payment to some
loan. The arrow from loan-payment to loan indicates that each payment is for a
single loan. The discriminator of the weak entity set is underlined with a dashed
line.
22.6.
E-R Features:
The basic E-R concepts can model most database features but in some
situations, it may require certain extensions to the basic E-R model. Some of these
features called specialization, generalization and aggregation are discussed here.
The process of designing sub groups within an entity set is called
specialization. Generalization is a containment relationship that exists between a
higher-level entity set and one or more lower-level entity sets. Higher- and lowerlevel entity sets also may be designated by the terms super class and subclass,
respectively. For example, the person entity set is the super class of the customer
and employee sub classes. In practical, generalization is the inversion of the
specialization.
SPECIALIZATIO
N
GENERALIZATI
ON
Figure : Generalization and Specialization
In most of the database applications, both specialization and generalization
are applied in combinations. Differences between the two approaches may be
characterized by their starting point and overall goal. Generalization proceeds from
the recognition that a number of entity sets share some common features such as
they are described by the same attributes and participating in the same relationship
sets.
Another important feature is Aggregation and is an abstraction in which
relationship sets along with their associated entity sets are treated as higher-level
entity sets, and can participate in relationships. An example for aggregation is
shown in the following figure.
Figure: Aggregation
22.7.
Data Mining:
Data mining is a rapidly expanding subject which is closely associated with
database technology. It includes the techniques for discovering patterns in
collections of data. This is mainly used is marketing, inventory management, quality
control, fraud detection and investment analysis.
There is a slight difference in the activities of data mining and traditional
databases. Data mining seeks to identify previously unknown patterns where as
database inquiries ask for the retrieval of stored data.
Data mining is carried in static data collections called data warehouse in
order to avoid frequent updates.. These ware houses are generally snapshots of data
bases.
Two common forms of data mining are class description and class
discrimination. Class description deals with identifying properties that characterize
a given group of data items, where as class discrimination deals with identifying
properties that divide two groups.
Another form of data mining is cluster analysis which tries to discover classes.
It seeks to find properties of data items that lead to the discovery of groupings.
One different from is association analysis which tries to find out the links
between data groups.
Yet another from is outlier analysis in which data entries that do not comply
with the norms are identified. That is errors in data collections can be identified to
recognize credit card theft by detecting sudden deviations from a customer’s normal
purchase patterns.
Sequential patterns analysis is another from of data mining that tries to
identify patterns of behaviour over time. There are used to reveal the trends in
economic systems or in environmental systems such as climate conditions.
Hence the results form data mining can be used to predict future behaviour
and the scope of data mining application is potentially enormous.