Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Data Base Management System Unit 2 Modelling and Design Frame Work Topics to be Covered – Data Models – Conceptual Design – ER diagram – relationships – Normalization – data management and system integration. Table of Contents Unit 2............................................................................................................................................................. 1 Modelling and Design Frame Work .............................................................................................................. 1 2.1 Data Models ........................................................................................................................................ 2 2.1.1 Types of Data Models .................................................................................................................. 2 2.2 Conceptual Design .............................................................................................................................. 8 2.3 ER Diagram and Relationships .......................................................................................................... 12 2.4 Normalization.................................................................................................................................... 21 2.5 Data Management and System Integration ...................................................................................... 28 2.6 Reference .......................................................................................................................................... 29 1 2.1 Data Models 2.1.1 Types of Data Models (i) Hierarchical Models The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1: N mapping between record types. This is done by using trees, like set theory used in the relational model, "borrowed" from maths. 2 For example, o An organization might store information about an employee, such as name, employee number, department, salary. o The organization might also store information about an employee's children, such as name and date of birth. o The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. o If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Information Management System (IMS) DBMS, through the 1970s. (ii) Network Models The data were more naturally modeled with more than one parent per child. So, the network model permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multi parent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types may exist, as well as sets between them. Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner and one or more record types are members. Usually, a set defines a 1: M relationship, although 1:1 is permitted. The CODASYL network model is based on mathematical set theory. 3 (iii)Relational Data Models RDBMS - relational database management system- A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. The database have the data and relations between them are organized in tables. A table is a collection of records and each record in a table contains the same fields. Properties of Relational Tables: o Values Are Atomic o Each Row is Unique o Column Values Are of the Same Kind o The Sequence of Columns is Insignificant o The Sequence of Rows is Insignificant o Each Column Has a Unique Name Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up. Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields. 4 Object ID (Implicit, hidden from the entire world) Primary Key (Explicit, visible to entire world) Student Class 2 Attributes Student Name Marks Map to approximate table Outline the corresponding table model for this class Write SQL code corresponding to the table model For example, o An "orders" table might contain (customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate a given customer's bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields, because these relationships are only specified at retrieval time, relational databases are classed as dynamic database management system. The RELATIONAL database model is based on the Relational Algebra. 5 Super Class Attributes Employee ID Employee Name Age Grade Nulls allowed? N N Y N Employee Table Sub Class Attributes Employee ID Bonus Number of Subordinates Nulls allowed? N Y N Manager Table Sub Class Clerk Table Attributes Employee ID Number of Pending tasks Nulls allowed? N Y (iv) Object – Oriented Model Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. 6 C++ Object Java Object OODBMS Object developers can write complete database applications with a modest amount of additional effort. According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems. The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs." In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches: it provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, World Wide Web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data. (v) Entity – Attribute Value Model (EVA) The best way to understand the rationale of EAV design is to understand row modeling of which EAV is a generalized form. Consider a supermarket database that must manage thousands of products and brands, many of which have a transitory existence. 7 Here, it is intuitively obvious that product names should not be hard-coded as names of columns in tables. Instead, one stores product descriptions in a Products table: purchases/sales of individual items are recorded in other tables as separate rows with a product ID referencing this table. Conceptually an EAV design involves a single table with three columns, an entity, an attribute and a value for the attribute. In EAV design, one row stores a single fact. In a conventional table that has one column per attribute, by contrast, one row stores a set of facts. EAV design is appropriate when the number of parameters that potentially apply to an entity is vastly more than those that actually apply to an individual entity. 2.2 Conceptual Design The DBMS architecture describes how data in the database is viewed by the users. It is not concerned with how the data is handled and processed by the DBMS. 8 The database users are provided with an abstract view of the data by hiding certain details of how data is physically stored. This enables the users to manipulate the data without worrying about where it is located or how it is actually stored. In this architecture, the overall database description can be defined at three levels, namely, internal, conceptual, and external levels and thus, named three-level DBMS architecture. This architecture is proposed by ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee) and hence, is also known as ANSI/SPARC architecture. The three levels are discussed here. Internal level: o It is the lowest level of data abstraction that deals with the physical representation of the database on the computer and thus, is also known as physical level. o It describes how the data is physically stored and organized on the storage medium. o At this level, various aspects are considered to achieve optimal runtime performance and storage space utilization. o These aspects include storage space allocation techniques for data and indexes, access paths such as indexes, data compression and encryption techniques, and record placement. Conceptual level: o This level of abstraction deals with the logical structure of the entire database and thus, is also known as logical level. o It describes what data is stored in the database, the relationships among the data and complete view of the user’s requirements without any concern for the physical implementation. o That is, it hides the complexity of physical storage structures. o The conceptual view is the overall view of the database and it includes all the information that is going to be represented in the database. External level: o It is the highest level of abstraction that deals with the user’s view of the database and thus, is also known as view level. o In general, most of the users and application programs do not require the entire data stored in the database. o The external level describes a part of the database for a particular group of users. o It permits users to access data in a way that is customized according to their needs, so that the same data can be seen by different users in different ways, at the same time. o In this way, it provides a powerful and flexible security mechanism by hiding the parts of the database from certain users, as the user is not aware of existence of any attributes that are missing from the view. 9 These three levels are used to describe the schema of the database at various levels. Thus, the three-level architecture is also known as three-schema architecture. The internal level has an internal schema, which describes the physical storage structure of the database. The conceptual level has a conceptual schema, which describes the structure of entire database. The external level has external schemas or user views, which describe the part of the database according to a particular user’s requirements, and hide the rest of the database from that user. The physical level is managed by the operating system under the direction of DBMS. The three-schema architecture. Three-schema architecture To understand the three-schema architecture, consider the three levels of the BOOK file in Online Book database. Two views (view 1 and view 2) of the BOOK file have been defined at the external level. Different database users can see these views. The details of the data types are hidden from the users. At the conceptual level, the BOOK records are described by a type definition. The application programmers and the DBA generally work at this level of abstraction. At the internal level, the BOOK records are described as a block of consecutive storage locations such as words or bytes. 10 The database users and the application programmers are not aware of these details; however, the DBA may be aware of certain details of the physical organization of the data. Three levels of Online Book database (BOOK file) In three-schema architecture, each user group refers only to its own external view. Whenever a user specifies a request to generate a new external view, the DBMS must transform the request specified at external level into a request at conceptual level, and then into a request at physical level. If the user requests for data retrieval, the data extracted from the database must be presented according to the need of the user. This process of transforming the requests and results between various levels of DBMS architecture is known as mapping. The main advantage of three-schema architecture is that it provides data independence. Data independence is the ability to change the schema at one level of the database system without having to change the schema at the other levels. Data independence is of two types, namely, logical data independence and physical data independence. o Logical data independence: 11 It is the ability to change the conceptual schema without affecting the external schemas or application programs. The conceptual schema may be changed due to change in constraints or addition of new data item or removal of existing data item, etc., from the database. The separation of the external level from the conceptual level enables the users to make changes at the conceptual level without affecting the external level or the application programs. For example, if a new data item, say Edition is added to the BOOK file, the two views are not affected. o Physical data independence: It is the ability to change the internal schema without affecting the conceptual or external schema. An internal schema may be changed due to several reasons such as for creating additional access structure, changing the storage structure, etc. The separation of internal schema from the conceptual schema facilitates physical data independence. 2.3 ER Diagram and Relationships E –R model is a high level conceptual data model development by chen in 1976 to facilitate database design. Conceptual database model is a set of concepts that describe the structure of a database and associated retrieval and update transactions on the database. A basic concept of the ER model includes entity types, relationship types and attributes. Components of E R Model: Entities (a) Attributes Relationship Entities Functional item in any data model Identify each entity type by a name and a list of properties Database – many entity types Ex- Book, a Publisher or a person – entity It is a atomic entity – it can’t be broken into small pieces. 12 (b) A book can have qualities that describes it like – ISBN, Title, Author, and Publisher. Attributes An entity composed of additional information, which describes the entity. Components of an entity or the qualifiers that describes called – attributes Ex – ISBN, Title, Author, Publisher, Price, Year of publication. Those information are additional information for a book entity The entity is shown in upper class and attributes shown in lower class Ex – BOOK (Entity) - ISBN, title, publisher (attributes) Attributes can have the same name in different entities but same entity can’t be duplicated. Each attributes is associated with a set of values called domain. Ex- a student age between 14 and 17 Even the domains may go for sub domain Ex – DOB - Date, Month, Year. Attributes are classified into 5 types Simple (1) Composite Single Valued Multivalued Derived Simple Single component with an independent existence Ex - Gender, Age, Salary (2) Composite Multiple components with an independent existence Ex – Address – Street name, Area, City, Pin code … (3) Single- Valued Single value for a single entity 13 Ex- class room entity have single value for the room number attribute and room number attribute to as being single – valued. (4) Multi- valued Attribute One that holds multiple values for a single entity Ex – student entity can have – hobby – reading, music, movies…… (5) Derived Attributes One that represent a value that is derived from the value of a related attributes Ex- Age attributes derived from DOB. (c) Relationship Entity – attribute definitions only capture that static meaning of the real world items. Ex – Book is published by a particular publisher, an employee may work for a manager, person has a child and child has a cousin ……. ERD for the Student entity First Name Last Name Student Class Date of Birth Age Hobbies Address Roll number Street State City 14 Pin Code Some terms related to entities and relationship – (a) (b) (c) (d) (e) Degree Connectivity Cordiality Dependency Participation (a) Degree Degree is the number of associated entities Student Requires Ex – Unary (Single entity) Teacher Subject Teaches Student 15 Ternary Relationship (b) Connectivity Relationship can be classified as one to one, one to many and many to many 1 Manager 1 Department Manages 1 Department N Has N N Employee Courses Joins (c) Employees Cardinality Specific number of entity occurrence associated with one occurrence of the related entity 1 Department N Employee Has (1,1) (0,100) N Employee N Joins (0,2) Course (0,10) 16 Company policy does not allow 100 employee at one department at same time they can’t opt more than 2 course which the company is offering 10 course. (d) Dependency Entities are classified as being strong or weak entity type Entity type that is existence – dependent on some other entity is called as weak entity and which entity does not dependent on the existence entity – strong entity Ex – Weak entity – Child, dependent or subordinate Strong entity – parent, owner, dominant ……. Strong Company Strong Employs Employee Weak Patent Has (e) Participation There are two ways an entity can participate in a relationship – totally or partially called mandatory or optional. 17 N Employee N Course Joins (0,2) (0,10) Employees can take up to 2 courses “Employee joins course”, “Employees not to join a course” Employee – Mandatory Course - Optional Ex – one to one relationship Name Age Gender Emp_no Employer Salary ISA Emp_no Consultant Designation Client 18 Ex – One to Many Name Age Gender Emp_no Employer Salary Dept_id Department Name Many to Many Name Age Gender Address Author Phone no Name Publisher Editor 19 Address E – R Diagram Book club has members Book club sells books to its members Members places order for books Each order contains one or more than on books. Books written by authors Publisher publishes the book Author can write more than one book and a book can have more than one author. Book is published by a publisher, but a publisher publishes many books. A member can place more than on order He can also choose not to place an order N 1 Member Enrolls in Book club (1,N) (1,1) 1 (1,N) 1 Places (1,N) Fulfills Sells N Order (1,N) N (1,1) 1 Publisher Author (1,N) (1,N) (1,1) N (1,N) 1 (1,1) Book Writes (1,N) Publishe s N 20 E – R diagram – Book Club 2.4 Normalization Process of building database structure to store data Process of normalization was first development by E. F. Codd Normalization is a formal process of developing data structure in a manner that eliminates redundancy and promotes integrity. Stages of Normal Forms (i) (ii) (iii) (iv) (BCNF) (v) (vi) First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce – Codd Normal Form Fourth Normal Form (4NF) Fifth Normal Form (5NF) Keys – Primary key – column of a table whose purpose is to uniquely identify records from the same table Foreign key – column in a table that uniquely identifies the records from a different table Relationship – (a) One to one – rarely used (b) One to many – commonly used (c) Many to many – problematic (a) First Normal Form (INF) Relation in which the intersection of each row and column contains one and only one value Repeating groups are eliminated or removed from the table. Ex – Create table contents. (Contact_id Integer Not null L_Name Varchar (20) Not null F_Name Varchar (20) Contact_Date1 Date Contact_Desc1 Varchar (50) Contact_Date2 Date 21 Contact_Desc2 P Varchar (50)); F P Above data structure has repeating group of date and description of 2 conversations. To eliminate the repeating group, the group moved to another table. P- Primary key, F-Foreign key (b) Second Normal Form (2NF) No non key is attributing to have functionally dependent on the primary key. (Emp_id L_Name F_Name Dept_Code Description Ex – Create table Employee. Integer Not null, Varchar (20) Not null, Varchar (20), integer, Varchar (50)); 22 Ex – Create table Employee. (Emp_id L_Name F_Name Dept_Code Integer Not null, Varchar (20) Not null, Varchar (20), integer Create table Department Dept_Code Description P- Primary Key, F – Foreign Key integer, Varchar (50)); P F (c) P Third Normal Form (3NF) A transitive dependency in a relation with functional dependency between two or more non key attributes. Ex – Create table Contacts. (Emp_id Integer Not null, L_Name Varchar (20) Not null, F_Name Varchar (20), Company_Name integer, Company_Location Varchar (50)); Contact id – primary key so all the remaining attributes are functionally dependent on this attribute. There is a transitive dependency Company_Location is dependent on Company_Name and Company_Name 23 dependent on Contact_id, until the location of the company differ on an individual basic, the column is not dependent on key value. o Anomaly (insertion, deletion, modification) P F P Boyce – Codd Normal Form (d) (BCNF) Database relations are designed so that they have neither partial dependencies nor transitive dependencies. Ex – ‘A’, ‘B’ B is dependent on A A B Contact_id L_Name Contact_id F_Name Contact_id Company_Id Transitive dependency – A,B,C – A -------> B, B ---------> C 24 – – – C is transitively dependencies on A via B Ex – Employee – Emp_id, Name, Address, Position,. Department – Dept_id, Name, Manager Emp_id ---------> Dept_id and Dept_id ---------> Dept_Name Emp_id ---------> Dept_Name via Dept_id In 3NF it allows the relations A----->B if there is no candidate key but BCNF it needs a candidate key and a primary key. Candidate_id Int_date Int_time Intvr_id Room_no C001 24 may 10.30 E001 1 C002 24 may 11.30 E001 1 C003 24 may 10.30 E002 2 C004 26 may 11.30 E003 2 – (Candidate_id, Int_date) --------> (Int_time, Intvr_id, Room_no) – primary key. (Intvr_id, Int_date) --------> Room_no – cause problem for the relation. Interview Table Candidate_id Int_date Int_time Intvr_id C001 24 may 10.30 E001 C002 24 may 11.30 E001 C003 24 may 10.30 E002 C004 26 may 11.30 E003 Room Table Candidate_id Int_date Int_time Intvr_id C001 24 may 10.30 E001 C002 24 may 11.30 E001 C003 24 may 10.30 E002 C004 26 may 11.30 E003 (e) Fourth Normal Form (4NF) 25 “No table should contain two or more one to many or many to many relationship that are not directly related to the key” - Multi Valued dependencies “The employee can work in more than one project and can have more than one hobby. The employee project and hobbies are independent of one another. To keep relation consistent we must have a separate tuple to represent every combination of an employee’s project and an employee’s hobbies Employee Table Name Project Hobbies Alexis Microsoft Reading Alexis Oracle Music Alexis Microsoft Music Alexis Oracle Reading Mathews Intel Movies Mathews Sybase Riding Mathews Intel Riding Mathews Sybase Movies Project Table Name Project Alexis Microsoft Alexis Oracle Mathews Intel Mathews Sybase 26 Hobby Table Name Hobbies Alexis Reading Alexis Music Mathews Movies Mathews Riding Ex -P (f) F F Fifth Normal Form (5NF) It is rarely used because “it requires semantically related multiple relationship”. Ex – create table lab_product_company (Lab_Id integer Not null Product_id integer Not null Company_Id integer Not null Split into 2 tables Lab_Product Lab _Company 27 Advantage of Database Reduces data redundancy – single centralized database. Consistency of data – redundancy will minimize presence of same data in different files. Flexibility of data – database is designed based on bottom up approach, the end user have all reports Enhanced data sharing – integrated centralized database, same file can be used in different application. Better enforcement of standards – database designed at a better enforcement of standards (field names, width, types…) Reduced program maintenance – database administrator. Increased programmer productivity – measure of time taken to develop an application 2.5 Data Management and System Integration Integrated data management lets you grow your business without growing your infrastructure. Ex – o Let’s say you’ve just completed an acquisition, and you need to bring three new manufacturing facilities on board. o Application consolidation and retirement is common in this scenario. o Data archiving capabilities help to minimize cost and accelerate completion of such scenarios: You can minimize the data that you migrate from the legacy system to the consolidated system accomplishing the consolidation faster and minimizing hardware and software requirements to support additional load. You retain archived information for as long as needed on lower cost stir, immutable if needed, while providing flexible access to it for e-discovery. Time to market is a critical imperative in today’s environment. IDM (Integrate Data Management) helps you produce enterprise-ready applications faster. For example o The pure Query technology is built to give developers the productivity they need, while helping them adopt best practices for data access. 28 o It provides a collaborative environment for developers and DBAs to work together to optimize data access performance, maximize database security, and improve manageability. o It makes service enabling vetted database assets as simple as a drag-and-drop gesture. o The test data management capabilities let tester leverage production-like data while safeguarding client privacy and corporate confidentiality. o Then we help to jump-start development and testing efforts based on accelerators for packaged applications, industry models, and compliance initiatives. For production systems, the challenge is to meet increasingly strict and challenging service level targets yet still free up staff time for value creation activities. We’re focused on learning more about the environment enabling the tools to automate and simplify operations. The aggregating and contextualizing information across the solution stack so that administrators have the information they need to identify emergent problems, view relevant information about the problem, isolate the problem to its source, and get expert advice on resolution. We want to help business and IT work together towards business objectives such that there is a common understanding and transparent and consistent execution across the business. We are building out a capability to define business policies and semantics early in the design cycle and then share them via models and other data artifacts across the lifecycle. We call this idea model-driven governance. Which is Not only will this improve alignment and governance, but also organizational productivity and effectiveness by facilitating seamless collaborate: From analysts to architects to developers to administrators from design to delivery to management. 2.6 Reference Database Management System – Alexis Leon and Mathews Leon 29