Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data analysis wikipedia , lookup
Business intelligence wikipedia , lookup
Predictive analytics wikipedia , lookup
Clusterpoint wikipedia , lookup
Forecasting wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data vault modeling wikipedia , lookup
PRINCIPLES OF COMPUTER SCIENCE MODULE 22 - E - R MODELS The objectives of this module are: 1. To understand the Entity Relationship model. 2. To learn the different symbols for representing an E-R model and E-R Diagram. 3. To have an idea about the various features of the E-R model. 4. To have a knowledge about the constraints to be enforced on the database. 5. To have an introduction to data mining. 22.1. Introduction: During the past, large collections of data were processed manually, their relationship were undiscoverable because they were scattered throughout large collections. But today’s technologies make it easy to collect huge amount of data, combine or compare different collections of data etc. The Entity – Relationship data model (E-R Model) provides a means of identifying entities to be represented in the database and a way to show the relationships between entities. The task of creating a database application is a complex procedure which needs the design of the database schema, design of the programs that access and update the data and design of security scheme. The Entity-Relationship (E-R) data model was developed to facilitate database design by allowing specification of a schema that represents the overall logical structure of the database. It is one among the semantic data models in which the semantic aspect of the model is represented. It is especially useful in mapping the meanings and interactions of real world entity onto a conceptual schema. 22.2. The Entity – Relationship (E-R) Model: E-R model grew out of the practice of using commercially available DBMS to model DBMS based on hierarchical and network models. Simply stating, E-R is a generalization of these models. The three features used to describe the E-R models are Entity Sets, Relationship Sets and Attributes. 22.2.1. Entity Sets: Entity is a real world thing or object which is distinguishable from every other object. For example, each student in a school is an entity. The entity will be associated with a set of properties and the values for some set of properties may be used to uniquely identify an entity. A student may have an admn_no property whose value uniquely identifies that student. An entity can be tangible such as a person or a book or it may be abstract such as a loan, a holiday or a concept. A set of entities of the same type that share the same properties are called an entity set. An entity set student represents all students of a particular school. The individual entities that constitute a set are said to be the extension of the entity set. An entity is described using a set of attributes. These are the descriptive properties and they express that the database stores similar information concerning each entity in the entity set but the values for each attribute may be distinct. The attributes possessed by a student entity may include student_id, student_name, house_no, street, city etc. The entities will have a value for each of its attributes. For example, a student entity may have the value 53276 for student_id, the value Karan for student_name, 258 for house_no, Second for street and Mumbai for city. The student_id attribute is used to identify the students uniquely, because there may be more than one student with the same name, street and city. A database hence includes a collection of entity sets, consisting of a number of entries of the same type. 22.2.2. Relationship Sets: The relationship represents the association among various entities. A relationship set is a set of relationships of the same type. If E1, E2, E3…….En are entity sets, then a relationship set R is a subset of {(e1 , e2 ,........en ) / e1 E1, e2 E2 ,.........en En )} where (e1 , e2 ,........en ) is a relationship. The association between entity sets is referred to as participation. ie., in the above specification, the entity sets E1, E2, E3…….En participate in a relation set R. An association between the named entities in the real world is represented in an E-R schema using a relationship instance. The function of an entity in a relationship is called that entity’s role. They are useful when the meaning of a relationship needs further clarification. If the same entity set participates in a relationship set more than once in different roles, such relationship set is said to be recursive. Hence explicit role names are necessary to specify how an entity participates in a relationship instance. A relationship can have attributes called descriptive attributes. But a relationship instance in a relationship set must be uniquely identifiable from its participating entities without using the descriptive attributes. The number of entity sets that participate in a relationship set is called the degree of the relationship set. For example, a binary relationship set has degree 2, a ternary relationship set has degree 3 etc. 22.2.3. Attributes: An attribute of an entity set is a function that maps from the entity set in to a domain. The set of permitted values for an attribute is called the domain or the value set of that attribute. The entity can be described by a set of (attribute, data value) pairs and each entity set may have several attributes. The attribute values describing an entity constitute a large portion of the data stored in the database. If an entity has no value for a particular attribute, it takes the value null to indicate not applicable or no value exists. Various types of attributes are used in the E-R model such as simple, composite, single valued, multi valued and derived. Simple: If an attribute cannot be divided into subparts, that attribute is said to be simple. Composite: If an attribute can be divided into subparts or other attributes, that attribute is called composite attribute. These are useful for designing such attributes which the user wish to refer to an entire attribute in some occasions and to only a component of the attribute on other occasions. Composite attributes help to group together related attributes making the modeling clear. For example, the address of a student entity can be composite consisting of street, city, state and postal code. Single- valued: The attributes for which there is only one value for a particular entity is called single valued attribute. For example, the admn_no for a student entity refers to only one admission number. Multi-valued: Sometimes an attribute will have a set of values for a specific entity. This type of attribute is said to be multi valued. The phone number for a student or employee is an example for multiple valued attribute became a particular student or employee may have zero, one or more phone numbers. Derived: If the value for an attribute can be derived from the value of other related attributes or entities, that is said to be a derived one. For example, the attribute age can be calculated from the data of birth and the current date. Hence the age is a derived attribute. 22.3. Constraints: Constraints within a database are rules which control values allowed in columns and also enforce the integrity between columns and tables. Each E-R schema may define some constraints to which the contents of a database must confirm. Some of the commonly used constraints are keys, mapping cardinalities and participation constraints. 22.3.1. Keys: Keys provide identification of attributes and relationships to distinguish entities and relationships from each other. The attribute values must be designed so as to uniquely identify the entity. A super key is the set of one or more attributes that taken collectively allows identifying uniquely an entity in the entity set. If K is a super key, then superset of k is also a super key. But in some situations, no proper subset is a super key. Those minimal super keys are called candidate keys. Assume that a combination of student_name and student_street identifies the members of the student entity set. Then { student_id } and { student_name, student_street } both are candidate keys. Although the attributes student_id and student_name together can distinguish student entities, their combination does not form a candidate key, since the attribute student_id alone is a candidate key. The primary key is a candidate key that is selected as the principal means of identifying entities within an entity set. The keys – primary, candidate and super are properties of the entity set rather than a single entity. Any two entities in the set are not allowed to have the same value on the key attributes at the same time. Hence the key represents a constraint in the real world. Like entities, the various relationships of a relationship set must also be identified or distinguished. If the relationship set R has no attribute associated with it, then the set of attributes Primary_key (E1) U Primary_key (E2) U ……………. Primary_key (En) describes an individual relationship in set R. If the relationship set R has attributes a1, a2, ………am associated with it, then the set of attributes Primary_key (E1) U Primary_key(E2) U ……………. Primary_key (En) U { a1, a2, ………am } describes an individual relationship in set R. In both of these cases, the set of attributes Primary_key (E1) U Primary_key(E2) U ……………. Primary_key (En) forms a super key for the relationship set. The structure of the primary key for the relationship set depends on the mapping cardinalities of the relationship set. 22.3.2. Mapping Cardinalities: The number of entities to which another entity can be associated through a relationship set is called the mapping cardinalities or cardinality ratios. They are highly useful while describing binary relationships even though relationship sets involving more than two entity sets can also be described. A binary relationship set R between entity sets A and B can be one among one-to-one, one-to-many, many-to-one, and many-to-many. One-to-One: Any entity in A is associated with at most one entity in B and an entity in B is associated with at most one entity in A. A a1 a1 a2 a2 a3 a3 a4 a4 B b1 b2 b3 b4 Figure: One-to-One One-to-Many: An entity in A is associated with any number of entries in B. but an entity in B can be associated with at most one entity in A. Figure: One-to-Many Many-to-One: An entity in A is associated with at most one entity in B. An entity in B can be associated with any number of entities. Figure: Many-to-One Many-to-Many: An entity in A is associated with any number of entities in B and an entity in B is associated with any number of entities in A. Figure: Many-to-Many The real world situation of the relationship set determines the mapping cardinality for that relationship set in the designing process. 22.4. Entity Relationship Diagram: The logical structure of an E-R model is represented graphically using the Entity-Relationship diagram or the E-R Diagram. The important components of an E-R diagram are: Rectangles to represent entity sets. Ellipses to represent attributes. Diamonds to represent relationship sets. Links to connect attributes to entity sets and entity sets to relationship sets. Double ellipses to represent multi valued attributes. Dashed ellipses to represent derived attributes. Double lines to represent total participation of an entity in a relationship set. Double rectangles to represent weak entity sets. These are summarized in the following table. The table provides the generally used symbols of E-R notation called the Chen’s notation. Table : Symbols used in E-R Diagram Symbol Meaning Entities Weak Entities Attributes Multi valued Attribute Derived Attribute Relationship set Identifying relationship set for a weak entity set Total participation of entity set in relationship Primary Key Discriminating attribute of weak entity sets S Many-to-Many relationship ince there is no unive Many-to-One relationship rsal stand ard for E- One-to-One relationship R diagra m notati on, some may use the alternative notation. Here an entity set may be represented as a box with name outside and the attributes listed one below the other within the box like, This means that E is an entity set with attributes A1, A2, A3 where A1 is the primary key. The mapping cardinalities can be represented in an alternative notation called crow’s-foot notation as given in the following figure. Many-to-Many Many-to-One One-to-One Here the relationship sets are shown by lines between entity sets without diamonds and only binary relationships can be modeled like this. For example, consider the E-R Diagram given here. Custpmer_name Customer_city Customer-id Customer_street Customer Loan_no borrower amount Loan Figure: E-R diagram for customer loan relation There are two entity sets customer and loan connected through a binary relationship set borrower. Customer has the attributes customer_id, customer_name, customer_street and customer_city. The loan is associated with the attributes loan_no and amount. If the relationship set has some attributes associated with it, these attributes should be linked to that relationship set as given here. Access_date Custpmer_name Customer_city Customer-id Customer_street Customer borrower Loan_no amount Loan Figure : Modified E-R diagram for customer loan relation The customer_name can be a composite attribute consisting of first_name, middle_name and last_name. Street can also be composite of street_number and name. A more precise representation of the above given E-R diagram is as follows which shows the composite, multi valued and derived attributes. street_no street_name middle_name first_name last_name street city custpmer_name customer-id address pin_code city loan_no borrower Customer balance Loan age phone_no date_of_birth access_date Figure: E-R diagram showing composite, multi valued and derived attributes 22.5. Weak Entity Sets: Some entities may not have sufficient attributes to form a primary key. Such entity sets are termed as weak entity set. An entity set having a primary key is called a strong entity set. Consider the entity set payment which has three attributes payment_no, date and amount. Payment numbers are sequential numbers starting from 1 generated sequentially for each loan. Thus although each payment entity is distinct, payments of different loans may share the same payment numbers. Thus this entity set does not have a primary key and is a weak entity set. To be meaningful, a weak entity set must be associated with another entity set called the identifying or owner entity set. The weak entity set is said to be existence dependent on the identifying entity set. The relationship between the weak entity set and the identifying entity set is called the identifying relationship. The identifying relationship is many-to-one from the weak entity set to the identifying entity set and the participation of the weak entity set in the relationship is total. The discriminator of a weak entity set is a set of attributes that allows the distinction of the weak entity set. The discriminator for payment is the payment is the payment_no because for each loan, a payment number uniquely identifies one single payment for that loan. This is also called the partial key of the entity set. The primary key of a weak entity set is formed by combining the primary key of the identifying entity set with the discriminator of the weak entity set. So the primary key of payment will be { loan_no, payment_no }, where loan _no is the primary key of the identifying entity set loan and payment_no is the discriminator for payment. The given E-R diagram represents the relationship between loan and payment date payment_no loan_no amount amount loan Loanpay ment payment Figure: E-R Diagram with weak entity set The double outlined box represents a weak entity set and a doubly outlined diamond indicates the identifying relationship. Double lines are used to show the total participation of the weak entity set payment in the relation loan-payment, which means that every payment must be related through loan-payment to some loan. The arrow from loan-payment to loan indicates that each payment is for a single loan. The discriminator of the weak entity set is underlined with a dashed line. 22.6. E-R Features: The basic E-R concepts can model most database features but in some situations, it may require certain extensions to the basic E-R model. Some of these features called specialization, generalization and aggregation are discussed here. The process of designing sub groups within an entity set is called specialization. Generalization is a containment relationship that exists between a higher-level entity set and one or more lower-level entity sets. Higher- and lowerlevel entity sets also may be designated by the terms super class and subclass, respectively. For example, the person entity set is the super class of the customer and employee sub classes. In practical, generalization is the inversion of the specialization. SPECIALIZATIO N GENERALIZATI ON Figure : Generalization and Specialization In most of the database applications, both specialization and generalization are applied in combinations. Differences between the two approaches may be characterized by their starting point and overall goal. Generalization proceeds from the recognition that a number of entity sets share some common features such as they are described by the same attributes and participating in the same relationship sets. Another important feature is Aggregation and is an abstraction in which relationship sets along with their associated entity sets are treated as higher-level entity sets, and can participate in relationships. An example for aggregation is shown in the following figure. Figure: Aggregation 22.7. Data Mining: Data mining is a rapidly expanding subject which is closely associated with database technology. It includes the techniques for discovering patterns in collections of data. This is mainly used is marketing, inventory management, quality control, fraud detection and investment analysis. There is a slight difference in the activities of data mining and traditional databases. Data mining seeks to identify previously unknown patterns where as database inquiries ask for the retrieval of stored data. Data mining is carried in static data collections called data warehouse in order to avoid frequent updates.. These ware houses are generally snapshots of data bases. Two common forms of data mining are class description and class discrimination. Class description deals with identifying properties that characterize a given group of data items, where as class discrimination deals with identifying properties that divide two groups. Another form of data mining is cluster analysis which tries to discover classes. It seeks to find properties of data items that lead to the discovery of groupings. One different from is association analysis which tries to find out the links between data groups. Yet another from is outlier analysis in which data entries that do not comply with the norms are identified. That is errors in data collections can be identified to recognize credit card theft by detecting sudden deviations from a customer’s normal purchase patterns. Sequential patterns analysis is another from of data mining that tries to identify patterns of behaviour over time. There are used to reveal the trends in economic systems or in environmental systems such as climate conditions. Hence the results form data mining can be used to predict future behaviour and the scope of data mining application is potentially enormous.