Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
ContactPoint wikipedia , lookup
Clusterpoint wikipedia , lookup
Unit I DATABASE SYSTEMS By Prof. R.R.Karwa Data and Information • Data: Facts concerning things such as people, object, events etc. e.g. students, the courses they are taking and their grades etc. • Information: Data that have been processed and presented in a form suitable for human interpretation. e.g. percentage enrolment in various courses, top rankers etc. Database Systems A Database Management System (DBMS) is a: 1. A collection of interrelated data (usually referred to as the database). 2. A set of application programs used to access, update and manage that data (which form the database system). The goal of a DBS is to provide an environment that is both convenient and efficient to use in: Retrieving information from the database and Storing information into the database. Database System Applications Databases are widely used in: • • • • • • • • Banking: all transactions Airlines: reservations, schedules Universities: course registrations, grades Finance: holdings, stocks Sales: customers, products, purchases Online retailers: order tracking, customized recommendations Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions Database System Vs File System • Old way : One way to keep information on a computer is to store it in OS files (File system) • Drawbacks of using file systems to store data: – Data redundancy and inconsistency • Multiple file formats, duplication of information in different files – Difficulty in accessing data • Need to write a new program to carry out each new task – Data isolation • multiple files and formats, difficult to write new program – Integrity problems • Data values stored in the database must satisfy certain types of consistency constraints. (eg.: account balance>$25) • Difficult to add new constraints or change existing ones Database System Vs File System (Cont.) • Drawbacks of using file systems (cont.) – Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • Example: Transfer of funds from one account to another should either complete or not happen at all – Concurrent access by multiple users • Concurrent access needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies – Example: Two people reading a balance and updating it at the same time – Security problems • Hard to provide user access to some, but not all, data • Database systems offer solutions to all the above problems Why Use a DBS? • Advantages of Database Systems: – Data independence and efficient access • Programs are independent of the details of data and storage. • Provides efficient storage and retrieval mechanism (index structure). – Reduced application development time • Provides functions required by applications (concurrency control). – Data integrity and security • Provides access control mechanism by view & authorization facilities – Uniform data administration • Provides common umbrella for large collection of data shared by user – Concurrent access, recovery from crashes • If there is a system crash, it can restore the database to a transaction consistent state. View of Data / Data Abstraction/ Levels of Data Abstraction • Major purpose of Database system is to provide users an abstract view of data (i.e. system hides certain details of how the data are stored and maintained.) • Since many database system users are not computer trained, developers hide complexity from users through several levels of abstraction, to simplify users interactions with the system. Levels of Abstraction • Physical level: describes how a record (e.g., customer) is stored. • Logical level: describes what data is stored in database, and the relationships among the data. type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : string; end; • View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes. Instances and Schemas • Databases change over time as information is inserted and deleted. • Instance – the actual content of the database at particular point in time – Analogous to variable declaration i.e. value of a variable in a program • Schema – the logical structure of the database – Example: The database consists of information about a set of customers and accounts and the relationship between them – Analogous to type definition of a variable in a program – Physical schema: database design at the physical level – Logical schema: database design at the logical level – View schema: database may have several schemas at the view level, sometimes called sub-schemas that describe different views of database. Data Independence • Definition: When the data do not depend on the physical or logical schema and thus not be rewritten if any of the schema change. • Logical Data Independence: That is, If we do changes in Logical schema, there should not be impact on Physical schemas • Physical Data Independence: That is, If we do changes in Physical schema, there should not be impact on Logical schemas Data Models • A collection of tools for describing – Data, Data relationships, Data semantics and Data constraints • Provides a way to describe the design of a database at the physical, logical & view level. • Data Models can be classified as: – Entity-Relationship data model (collection of entities : E-R Diagram) – Relational data model (collection of tables : widely used) – Object-based data models (based on Object-oriented and Objectrelational) – Semi structured data model (XML) – Other older models: Network model and Hierarchical model Database Languages • A database system provides a Data Definition Language (DDL) to specify the database schema and Data Manipulation Language (DML) to express database queries and updates. • In fact, DDL and DML are not two separate languages, instead they simply form parts of a single database language, such as widely used SQL language. Data Definition Language • Specification notation for defining the database schema Example: create table account ( account_number char(10), branch_name char(10), balance integer); • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata (i.e., data about data) – Database schema – Data storage and definition language • Specifies the storage structure and access methods used – Integrity constraints • Domain constraints (Datatype etc) • Referential integrity (e.g. branch_name must correspond to a valid branch in the branch table) • Assertions (conditions must satisfy) – Authorization • Read, Insert, update , delete Data Definition Language • Example: – CREATE- to create objects in database. – ALTER- to alter the structures of database. – DROP- to delete objects from database. – TRUNCATE- remove all records from a table including all spaces allocated for records are remove. – RENAME- to rename an object. Data Manipulation Language • Language for accessing and manipulating the data organized by the appropriate data model – DML also known as query language • Two classes of languages – Procedural – user specifies what data is required and how to get those data – Declarative (nonprocedural) – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language • Example: – – – – – SELECT- retrieve data from a database. INSERT- inserts data into a table. UPDATE- updates existing data within a table. DELETE- deletes all records from a table, the space for records remain. LOOK_TABLE- control concurrency. Definition • Data Dictionary: It is considered to be a special type of table, which can only be accessed and updated by database system itself (not a regular user). It is also called as data directory. A database system consults data dictionary before reading or modifying data. • Metadata: A data dictionary contains metadata i.e. data about data. Eg: the schema of a table Q) Define (a) Data Dictionary (b) Metadata (4M) Database Users and Administrators • People who work with database can be categorized as database users or database administrators. • Database users and user interfaces: The database users fall into several categories by way they expect to interact with system. – – – – Application Programmers (write own application program - RAD tools) Sophisticated Users (without writing programs – Query language) Specialized Users (Specialized applications – CAD tools, expert system) Naive Users (invoke previously written application program - Form) • Database Administrator: A person who has central control of both the data and the program that process those data. Functions of DBA are: – – – – – Schema definition Storage structure and access method definition Schema and physical organization modification Granting of authorization for data access Routine maintenance Transaction Management • A transaction is a collection of operations that performs a single logical function in a database application. It is a collection of atomicity & consistency. • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. • Failure recovery detects systems failure and restore the database to the state that existed prior to the occurrence of the failure. • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database. Transaction Management • It should satisfy ACID properties: – Example: Fund transfer, in which one account (say A) is debited and another account (say B) is credited. • Atomicity: In example, it is essential that either both the credit & debit occur, or that neither occur (i.e. it must happen in its entirety or not at all). This all or none requirement is called atomicity. • Consistency: In example, it is essential that execution of fund transfer preserve the consistency of the database (i.e. the value of sum A + B must be preserved). This correctness requirement is called consistency. • Isolation: In example, it is essential that if several transactions are executed concurrently, it results in an inconsistent state (i.e. if a second concurrent transaction reads A & B at the intermediate point and computes A+B, it will be an inconsistent value). It is called isolation. • Durability: In example, after successful execution, the new values of A & B must persist, despite system failure. This persistent requirement is called durability. Database System Structure • A database system is partitioned into modules that deal with each of the responsibilities of overall system. • The functional components can be broadly divided into two parts: – Storage Manager – Query Processor • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible to the following tasks: – Interaction with the file manager – Efficient storing, retrieving and updating of data Database System Structure (Cont.) • Storage manager components include: – Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks the authority of users to access data. – Transaction Manager, which ensures that database remains in a consistent state despite system failures and concurrent transaction executions proceed without conflicting. – File Manager, which manages the allocation of space on disk storage and the data structures used to represent information stored on disk. – Buffer Manager, which is responsible for fetching data from disk storage into main memory and deciding what data to cache in main memory. • Storage manager implements several data structure as: – Data files, which stores the database itself – Data Dictionary, which stores metadata – Indices, which provide fast access to data items that hold particular values Database System Structure (Cont.) • Query Processor is important because it helps the database system simplify and facilitate access to data. • Query processor components include: – DDL Interpreter, which interprets DDL statements and records definition in the data dictionary. – DML Compiler, which translates DML statements in a query language into evaluation plan consisting of low-level instructions that query evaluation engine understands. – Query Optimizer, which picks the lowest cost evaluation plan from among the alternative evaluation plans that all give the same result. – Query Evaluation Engine, which executes low level instructions generated by DML compiler. Overall System Structure (Cont.) Database Application Architecture • The architecture is influenced by the computer system on which database system runs. • DBS can be centralized or client-server, where one server machine executes work on behalf of multiple client machines. • Most users today are not present at the site of database system, but connect to it through network. • It can differentiate between client machines, on which remote database users work, and server machines, on which database system runs. • Database applications are partitioned into two or three parts as shown in figure: Database Application Architecture Old Modern Database Application Architecture (Cont.) • Two-Tier Architecture – The application is partitioned into component that resides at the client machine, which invoke database system functionality at the server machine through query language statements. – Application program interface standards like ODBC and JDBC are used for interaction between client & server. • Three-Tier Architecture – The client machine acts as front end and does not contain direct database calls. Instead, the client end communicates with an application server through a form interface. The server in turn communicates with database system to access data. – Three-tier Applications are more appropriate for large applications that run on world wide web. Database Application Architecture (Cont.) • Two-Tier Architecture (Advantages) – Understanding & maintenance is easier. • Two-Tier Architecture (Disadvantages) – Performance will be reduced when there are more users. • Three-Tier Architecture (Advantages) – Easy to modify without affecting other modules. – Fast communication. – Performance will be good. E-R Model Contents • • • • • • • • • Basic Concepts Design Constraints Keys Design Issues E-R Diagram Weak Entity Sets Extended E-R Features Design of an E-R Database Schema Reduction of an E-R Schema to Tables Basic Concepts • A database can be modeled as: – a collection of entities, – relationship among entities. • An entity is an object that exists and is distinguishable from other objects. – Example: specific person, company, event, plant • Entities have attributes – Example: people have names and addresses • An entity set is a set of entities of the same type that share the same properties. – Example: set of all persons, companies, trees, holidays Entity Sets (Customer and Loan) ID Name Street City Loano. Amt. Attributes • An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set. Example: customer = (customer-id, customer-name, customer-street, customer-city) loan = (loan-number, amount) • Domain – the set of permitted values for each attribute • Attribute types: – Simple and composite attributes. – Single-valued and multi-valued attributes • E.g. single attribute: loan_no; multivalued attribute: phone-numbers – Derived attributes • Can be computed from other attributes • E.g. age, given date of birth Composite Attributes Single-valued, Multi-valued and Derived Attributes E-R Diagram with Composite, Multivalued and Derived Attributes Relationship Sets • A relationship is an association among several entities - Example: Hayes depositor A-102 customer entity relationship account entity • A relationship set is set of relationships of the same type. It is a mathematical relation among n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship – Example: (Hayes, A-102) depositor Relationship Sets with Attributes Relationship Sets • An attribute can also be property of a relationship set. • For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date. Design Constraints • The mapping cardinalities and participation constraints are two of the most important types of constraints. Q) Explain all design constraints while designing enterprise. (6M) Q) Explain with diagrams all the mapping cardinalities. (8M) 1. Mapping Cardinalities • Express the number of entities to which another entity can be associated via a relationship set. • Most useful in describing binary relationship sets. • For a binary relationship set the mapping cardinality must be one of the following types: – – – – One to one One to many Many to one Many to many • Cardinality Constraints: We express cardinality constraints by drawing either a directed line () signifying “one,” or an undirected line (—) signifying “many,” between the relationship set and the entity set. Mapping Cardinalities (Cont.) One to one One to many Note: Some elements in A and B may not be mapped to any elements in the other set Mapping Cardinalities (Cont.) Many to one Many to many Note: Some elements in A and B may not be mapped to any elements in the other set One-To-One Relationship • A customer is associated with at most one loan via the relationship borrower • A loan is associated with at most one customer via borrower One-To-Many Relationship • In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower Many-To-One Relationship • In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower Many-To-Many Relationship • A customer is associated with several (possibly 0) loans via borrower • A loan is associated with several (possibly 0) customers via borrower 2. Participation Constraints • Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set – E.g. participation of loan in borrower is total • every loan must have a customer associated to it via borrower • Partial participation: some entities may not participate in any relationship in the relationship set – E.g. participation of customer in borrower is partial Keys • A key allows us to identify a set of attributes that enough to distinguish entities from each other. • A key also helps uniquely identify relationships and thus distinguish relationships from each other. • Entity Sets : A key is a property of the entity set rather than of the individual entities. • For entity set we have three types of keys: – Super Key – Primary Key – Candidate Key Keys (Cont.) • A super key is a set of one or more attributes whose values uniquely determine each entity in an entity set . – E.g.: {cust_id}, {cust_name, cust_id} • A candidate keys are those super keys for which no proper subset is a super key. They are the minimal super keys. – E.g.:{cust-id} and {cust_nm, cust_street}are candidate keys of customer – E.g.: account-number is candidate key of account • A Primary key are used to denote a candidate key such that its attributes are never or very rarely changed. Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. – E.g.: Election_Id Example • Relation: Book (Book_Id, Book_Name, Author) • Super Key: (Book_Id) (Book_Id, Book_Name) (Book_Id, Book_Name, Author) (Book_Id, Author) (Book_Name, Author) • Candidate Key: (Book_Id) (Book_Name, Author) • Primary Key: From above candidate keys any one can be the primary key and the other one will be known as alternate key. Keys (Cont.) • A Foreign Key is the key of a table in which it resides. – If branch_nm in branch table acts as a primary key, that same branch_nm resides in account table as an attribute, then the branch_nm in account is the foreign key of branch. • A Surrogate Key is a unique, system supplied identifier used as a primary key of a relation. The values of surrogate key have no meaning to the users and are usually hidden on forms and reports. – The DBS will not allow the value of a surrogate key to be changed. Foreign Key • A Foreign Key is the key of a table in which it resides. • A relation schema may have an attribute that corresponds to the primary key of another relation. The attribute is called a foreign key. – E.g. customer_name and account_number attributes of depositor are foreign keys to customer and account respectively. – Only values occurring in the primary key attribute of the referenced relation may occur in the foreign key attribute of the referencing relation. • Schema Diagram Entity-Relationship Diagram • Rectangles represent entity sets. • Diamonds represent relationship sets. • Lines link attributes to entity sets and entity sets to relationship sets. • Ellipses represent attributes – Double ellipses represent multivalued attributes. – Dashed ellipses denote derived attributes. • Underline indicates primary key attributes Summary of Symbols used in E-R Notation Summary of Symbols (Cont.) Roles • Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles. • The labels “manager” and “worker” are called roles; they specify how employee entities interact via the works-for relationship set. • Role labels are optional, and are used to clarify semantics of the relationship Ternary Relationship • Non-binary (Ternary) relationship sets can also specify in an E-R diagram. • Following figure consists of three entity sets employee, job and branch related through the relationship set works-on. Weak Entity Sets • An entity set that does not have a primary key is referred to as a weak entity set. • The relationship associating the weak entity set with the identifying entity is called identifying relationship. • The existence of a weak entity set depends on the existence of a identifying or owner entity set – it must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set – Identifying relationship depicted using a double diamond • The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. Weak Entity Sets (Cont.) • Double outlined box – weak entity set. • Double outlined diamond – identifying relationship. • Double lines – total participation, that every payment must be related to loan via loan-payment. • Dashed line – discriminator (partial key). • Weak entity set payment depends on strong entity set loan via the relationship set loan-payment as shown in figure: Extended E-R Features • Extended E-R features are Specialization, Generalization, Aggregation, Higher and Lower Level Entity Sets, Attribute Inheritance. • Specialization: The process of designating sub groupings within an entity set is called specialization. It is proceed in top-down manner Extended E-R Features (Cont.) • Generalization: It is the result of taking union of two or more disjoint entity sets to produce higher level entity set. It is proceed in bottom-up manner. Extended E-R Features (Cont.) • Aggregation: The process through which one can treat the relationship as higher-level entities is known as aggregation. • It is a feature of the E-R model that allows a relationship set to participate in another relationship set. Aggregation Need • One limitation of E-R model is that it cannot express relationship among relationship. • To illustrate the need for such a constraint, construct ternary relationship works-on between employee, branch and job. • But, we cannot combine them into a single relationship, since some employee, branch, job combinations may not have a manager. • The best way to model a situation is to use aggregation. Extended E-R Features (Cont.) • Attribute Inheritance: The attributes of higher-level entity sets are said to be inherited by lower-level entity sets, This property is known as attribute inheritance. • In a hierarchy, if an entity set may be involved as a lower-level entity set in only one ISA relationship, then the entity has single inheritance. • In a hierarchy, if an entity set may be involved as a lower-level entity set in more than one ISA relationship, then the entity has multiple inheritance and resulting structure is said to be a lattice. Specialization, Generalization, Attribute Inheritance Reduction of an E-R Schema to Tables • Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database. • A database which conforms to an E-R diagram can be represented by a collection of tables. • For each entity set and relationship set there is unique table which is assigned the name of the corresponding entity set or relationship set. • Each table has a number of columns (generally corresponding to attributes), which have unique names. • Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram. Representing Entity Sets as Tables • A strong entity set reduces to a table. University Questions on E-R Diagram • Construct an entity relationship diagram of banking environment where the customer borrows a loan from the bank and an account in specific branch in a Bank. Reduce the E-R schema to a table. • Bank Tables: – customer (customer_id, customer_name, customer_street, customer_city) – Loan (loan_number, amount) – Borrower (customer_id, loan_number) University Questions on E-R Diagram • Construct an E-R diagram for keeping track of exploits of your favourite sports team. You should store the matches played, the scores in each match, the players in each match and individual player statistics for each match. Summary should be modeled as derived attribute. University Questions on E-R Diagram • Consider a database used to record the marks that students get in different exams of different course-offerings. (a) Construct an E-R diagram that models exams as entities, and uses a ternary relationship for above database. (b) Construct an alternative E-R diagram that uses only a binary relationship between students and course-offerings. Make sure that only one relationship exists between a particular student and course- offering pair, yet you can represent the marks that a student gets in different exams of a course-offering. E-R diagram for Ternary Relationship (a) E-R diagram for Binary Relationship (b)