Download - Mitra.ac.in

Document related concepts

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Unit I
DATABASE SYSTEMS
By
Prof. R.R.Karwa
Data and Information
• Data: Facts concerning things such as people, object, events etc.
e.g. students, the courses they are taking and their grades etc.
• Information: Data that have been processed and presented in a form
suitable for human interpretation.
e.g. percentage enrolment in various courses, top rankers etc.
Database Systems
 A Database Management System (DBMS) is a:
1. A collection of interrelated data (usually referred to as
the database).
2. A set of application programs used to access, update
and manage that data (which form the database system).
 The goal of a DBS is to provide an environment that is both
convenient and efficient to use in: Retrieving information from the
database and Storing information into the database.
Database System Applications
Databases are widely used in:
•
•
•
•
•
•
•
•
Banking: all transactions
Airlines: reservations, schedules
Universities: course registrations, grades
Finance: holdings, stocks
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Database System Vs File System
• Old way : One way to keep information on a computer is to store it in
OS files (File system)
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information in different
files
– Difficulty in accessing data
• Need to write a new program to carry out each new task
– Data isolation
• multiple files and formats, difficult to write new program
– Integrity problems
• Data values stored in the database must satisfy certain types of
consistency constraints. (eg.: account balance>$25)
• Difficult to add new constraints or change existing ones
Database System Vs File System (Cont.)
• Drawbacks of using file systems (cont.)
– Atomicity of updates
• Failures may leave database in an inconsistent state with
partial updates carried out
• Example: Transfer of funds from one account to another
should either complete or not happen at all
– Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
– Example: Two people reading a balance and updating it
at the same time
– Security problems
• Hard to provide user access to some, but not all, data
• Database systems offer solutions to all the above problems
Why Use a DBS?
• Advantages of Database Systems:
– Data independence and efficient access
• Programs are independent of the details of data and storage.
• Provides efficient storage and retrieval mechanism (index structure).
– Reduced application development time
• Provides functions required by applications (concurrency control).
– Data integrity and security
• Provides access control mechanism by view & authorization facilities
– Uniform data administration
• Provides common umbrella for large collection of data shared by user
– Concurrent access, recovery from crashes
• If there is a system crash, it can restore the database to a transaction
consistent state.
View of Data / Data Abstraction/ Levels of Data
Abstraction
• Major purpose of Database system is to provide users an
abstract view of data (i.e. system hides certain details of how the
data are stored and maintained.)
• Since many database system users are not computer trained,
developers hide complexity from users through several levels of
abstraction, to simplify users interactions with the system.
Levels of Abstraction
• Physical level: describes how a record (e.g., customer) is stored.
• Logical level: describes what data is stored in database, and the
relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
end;
• View level: application programs hide details of data types. Views
can also hide information (such as an employee’s salary) for security
purposes.
Instances and Schemas
• Databases change over time as information is inserted and deleted.
• Instance – the actual content of the database at particular point in time
– Analogous to variable declaration i.e. value of a variable in a
program
• Schema – the logical structure of the database
– Example: The database consists of information about a set of
customers and accounts and the relationship between them
– Analogous to type definition of a variable in a program
– Physical schema: database design at the physical level
– Logical schema: database design at the logical level
– View schema: database may have several schemas at the view level,
sometimes called sub-schemas that describe different views of
database.
Data Independence
• Definition: When the data do not depend on the physical or logical
schema and thus not be rewritten if any of the schema change.
• Logical Data Independence: That is, If we do changes in Logical
schema, there should not be impact on Physical schemas
• Physical Data Independence: That is, If we do changes in Physical
schema, there should not be impact on Logical schemas
Data Models
• A collection of tools for describing
– Data, Data relationships, Data semantics and Data constraints
• Provides a way to describe the design of a database at the physical,
logical & view level.
• Data Models can be classified as:
– Entity-Relationship data model (collection of entities : E-R Diagram)
– Relational data model (collection of tables : widely used)
– Object-based data models (based on Object-oriented and Objectrelational)
– Semi structured data model (XML)
– Other older models: Network model and Hierarchical model
Database Languages
• A database system provides a Data Definition Language (DDL) to
specify the database schema and Data Manipulation Language
(DML) to express database queries and updates.
• In fact, DDL and DML are not two separate languages, instead they
simply form parts of a single database language, such as widely used
SQL language.
Data Definition Language
• Specification notation for defining the database schema
Example: create table account ( account_number char(10),
branch_name char(10), balance integer);
• DDL compiler generates a set of tables stored in a data dictionary
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Data storage and definition language
• Specifies the storage structure and access methods used
– Integrity constraints
• Domain constraints (Datatype etc)
• Referential integrity (e.g. branch_name must correspond to a
valid branch in the branch table)
• Assertions (conditions must satisfy)
– Authorization
• Read, Insert, update , delete
Data Definition Language
• Example:
– CREATE- to create objects in database.
– ALTER- to alter the structures of database.
– DROP- to delete objects from database.
– TRUNCATE- remove all records from a table including all spaces
allocated for records are remove.
– RENAME- to rename an object.
Data Manipulation Language
• Language for accessing and manipulating the data organized by the
appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is required and how to get those data
– Declarative (nonprocedural) – user specifies what data is required without
specifying how to get those data
• SQL is the most widely used query language
• Example:
–
–
–
–
–
SELECT- retrieve data from a database.
INSERT- inserts data into a table.
UPDATE- updates existing data within a table.
DELETE- deletes all records from a table, the space for records remain.
LOOK_TABLE- control concurrency.
Definition
• Data Dictionary: It is considered to be a special type of table, which
can only be accessed and updated by database system itself (not a
regular user). It is also called as data directory. A database system
consults data dictionary before reading or modifying data.
• Metadata: A data dictionary contains metadata i.e. data about data.
Eg: the schema of a table
Q) Define (a) Data Dictionary (b) Metadata
(4M)
Database Users and Administrators
• People who work with database can be categorized as database users
or database administrators.
• Database users and user interfaces: The database users fall into several
categories by way they expect to interact with system.
–
–
–
–
Application Programmers (write own application program - RAD tools)
Sophisticated Users (without writing programs – Query language)
Specialized Users (Specialized applications – CAD tools, expert system)
Naive Users (invoke previously written application program - Form)
• Database Administrator: A person who has central control of both the
data and the program that process those data. Functions of DBA are:
–
–
–
–
–
Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting of authorization for data access
Routine maintenance
Transaction Management
• A transaction is a collection of operations that performs a single logical
function in a database application. It is a collection of atomicity &
consistency.
• Transaction-management component ensures that the database remains in
a consistent (correct) state despite system failures (e.g., power failures and
operating system crashes) and transaction failures.
• Failure recovery detects systems failure and restore the database to the state
that existed prior to the occurrence of the failure.
• Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
Transaction Management
• It should satisfy ACID properties:
– Example: Fund transfer, in which one account (say A) is debited and
another account (say B) is credited.
• Atomicity: In example, it is essential that either both the credit & debit
occur, or that neither occur (i.e. it must happen in its entirety or not at all).
This all or none requirement is called atomicity.
• Consistency: In example, it is essential that execution of fund transfer
preserve the consistency of the database (i.e. the value of sum A + B must be
preserved). This correctness requirement is called consistency.
• Isolation: In example, it is essential that if several transactions are executed
concurrently, it results in an inconsistent state (i.e. if a second concurrent
transaction reads A & B at the intermediate point and computes A+B, it will
be an inconsistent value). It is called isolation.
• Durability: In example, after successful execution, the new values of A & B
must persist, despite system failure. This persistent requirement is called
durability.
Database System Structure
• A database system is partitioned into modules that deal with each of the
responsibilities of overall system.
• The functional components can be broadly divided into two parts:
– Storage Manager
– Query Processor
• Storage manager is a program module that provides the interface between
the low-level data stored in the database and the application programs and
queries submitted to the system.
• The storage manager is responsible to the following tasks:
– Interaction with the file manager
– Efficient storing, retrieving and updating of data
Database System Structure (Cont.)
• Storage manager components include:
– Authorization and integrity manager, which tests for the satisfaction of
integrity constraints and checks the authority of users to access data.
– Transaction Manager, which ensures that database remains in a consistent state
despite system failures and concurrent transaction executions proceed without
conflicting.
– File Manager, which manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
– Buffer Manager, which is responsible for fetching data from disk storage into
main memory and deciding what data to cache in main memory.
• Storage manager implements several data structure as:
– Data files, which stores the database itself
– Data Dictionary, which stores metadata
– Indices, which provide fast access to data items that hold particular values
Database System Structure (Cont.)
• Query Processor is important because it helps the database system simplify
and facilitate access to data.
• Query processor components include:
– DDL Interpreter, which interprets DDL statements and records definition in the
data dictionary.
– DML Compiler, which translates DML statements in a query language into
evaluation plan consisting of low-level instructions that query evaluation engine
understands.
– Query Optimizer, which picks the lowest cost evaluation plan from among the
alternative evaluation plans that all give the same result.
– Query Evaluation Engine, which executes low level instructions generated by
DML compiler.
Overall System Structure (Cont.)
Database Application Architecture
• The architecture is influenced by the computer system on which database
system runs.
• DBS can be centralized or client-server, where one server machine executes
work on behalf of multiple client machines.
• Most users today are not present at the site of database system, but connect
to it through network.
• It can differentiate between client machines, on which remote database users
work, and server machines, on which database system runs.
• Database applications are partitioned into two or three parts as shown in
figure:
Database Application Architecture
Old
Modern
Database Application Architecture (Cont.)
• Two-Tier Architecture
– The application is partitioned into component that resides at the client machine,
which invoke database system functionality at the server machine through query
language statements.
– Application program interface standards like ODBC and JDBC are used for
interaction between client & server.
• Three-Tier Architecture
– The client machine acts as front end and does not contain direct database calls.
Instead, the client end communicates with an application server through a form
interface. The server in turn communicates with database system to access data.
– Three-tier Applications are more appropriate for large applications that run on
world wide web.
Database Application Architecture (Cont.)
• Two-Tier Architecture (Advantages)
– Understanding & maintenance is easier.
• Two-Tier Architecture (Disadvantages)
– Performance will be reduced when there are more users.
• Three-Tier Architecture (Advantages)
– Easy to modify without affecting other modules.
– Fast communication.
– Performance will be good.
E-R Model
Contents
•
•
•
•
•
•
•
•
•
Basic Concepts
Design Constraints
Keys
Design Issues
E-R Diagram
Weak Entity Sets
Extended E-R Features
Design of an E-R Database Schema
Reduction of an E-R Schema to Tables
Basic Concepts
• A database can be modeled as:
– a collection of entities,
– relationship among entities.
• An entity is an object that exists and is distinguishable from other objects.
– Example: specific person, company, event, plant
• Entities have attributes
– Example: people have names and addresses
• An entity set is a set of entities of the same type that share the same
properties.
– Example: set of all persons, companies, trees, holidays
Entity Sets (Customer and Loan)
ID
Name
Street
City
Loano.
Amt.
Attributes
• An entity is represented by a set of attributes, that is descriptive properties
possessed by all members of an entity set.
Example:
customer = (customer-id, customer-name,
customer-street, customer-city)
loan = (loan-number, amount)
• Domain – the set of permitted values for each attribute
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• E.g. single attribute: loan_no; multivalued attribute: phone-numbers
– Derived attributes
• Can be computed from other attributes
• E.g. age, given date of birth
Composite Attributes
Single-valued, Multi-valued and Derived Attributes
E-R Diagram with Composite, Multivalued
and Derived Attributes
Relationship Sets
• A relationship is an association among several entities
- Example:
Hayes
depositor
A-102
customer entity
relationship
account entity
• A relationship set is set of relationships of the same type. It is a
mathematical relation among n  2 entities, each taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}
where (e1, e2, …, en) is a relationship
– Example:
(Hayes, A-102)  depositor
Relationship Sets with Attributes
Relationship Sets
• An attribute can also be property of a relationship set.
• For instance, the depositor relationship set between entity sets customer
and account may have the attribute access-date.
Design Constraints
• The mapping cardinalities and participation constraints are two of
the most important types of constraints.
Q) Explain all design constraints while designing enterprise. (6M)
Q) Explain with diagrams all the mapping cardinalities. (8M)
1. Mapping Cardinalities
• Express the number of entities to which another entity can be associated via
a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the
following types:
–
–
–
–
One to one
One to many
Many to one
Many to many
• Cardinality Constraints: We express cardinality constraints by drawing
either a directed line () signifying “one,” or an undirected line (—)
signifying “many,” between the relationship set and the entity set.
Mapping Cardinalities (Cont.)
One to one
One to many
Note: Some elements in A and B may not be mapped to any elements in the other set
Mapping Cardinalities (Cont.)
Many to one
Many to many
Note: Some elements in A and B may not be mapped to any elements in the other set
One-To-One Relationship
• A customer is associated with at most one loan via the relationship borrower
• A loan is associated with at most one customer via borrower
One-To-Many Relationship
• In the one-to-many relationship a loan is associated with at most one
customer via borrower, a customer is associated with several
(including 0) loans via borrower
Many-To-One Relationship
• In a many-to-one relationship a loan is associated with several
(including 0) customers via borrower, a customer is associated with
at most one loan via borrower
Many-To-Many Relationship
• A customer is associated with several (possibly 0) loans via borrower
• A loan is associated with several (possibly 0) customers via borrower
2. Participation Constraints
• Total participation (indicated by double line): every entity in the entity set
participates in at least one relationship in the relationship set
– E.g. participation of loan in borrower is total
• every loan must have a customer associated to it via borrower
• Partial participation: some entities may not participate in any relationship
in the relationship set
– E.g. participation of customer in borrower is partial
Keys
• A key allows us to identify a set of attributes that enough to distinguish
entities from each other.
• A key also helps uniquely identify relationships and thus distinguish
relationships from each other.
• Entity Sets : A key is a property of the entity set rather than of the
individual entities.
• For entity set we have three types of keys:
– Super Key
– Primary Key
– Candidate Key
Keys (Cont.)
• A super key is a set of one or more attributes whose values uniquely
determine each entity in an entity set .
– E.g.: {cust_id}, {cust_name, cust_id}
• A candidate keys are those super keys for which no proper subset is a super
key. They are the minimal super keys.
– E.g.:{cust-id} and {cust_nm, cust_street}are candidate keys of customer
– E.g.: account-number is candidate key of account
• A Primary key are used to denote a candidate key such that its attributes are
never or very rarely changed. Although several candidate keys may exist,
one of the candidate keys is selected to be the primary key.
– E.g.: Election_Id
Example
• Relation: Book (Book_Id, Book_Name, Author)
• Super Key: (Book_Id)
(Book_Id, Book_Name)
(Book_Id, Book_Name, Author)
(Book_Id, Author)
(Book_Name, Author)
• Candidate Key: (Book_Id)
(Book_Name, Author)
• Primary Key: From above candidate keys any one can be the primary key
and the other one will be known as alternate key.
Keys (Cont.)
• A Foreign Key is the key of a table in which it resides.
– If branch_nm in branch table acts as a primary key, that same branch_nm
resides in account table as an attribute, then the branch_nm in account is
the foreign key of branch.
• A Surrogate Key is a unique, system supplied identifier used as a primary
key of a relation. The values of surrogate key have no meaning to the users
and are usually hidden on forms and reports.
– The DBS will not allow the value of a surrogate key to be changed.
Foreign Key
• A Foreign Key is the key of a table in which it resides.
• A relation schema may have an attribute that corresponds to the primary key of
another relation. The attribute is called a foreign key.
– E.g. customer_name and account_number attributes of depositor are foreign
keys to customer and account respectively.
– Only values occurring in the primary key attribute of the referenced relation
may occur in the foreign key attribute of the referencing relation.
• Schema Diagram
Entity-Relationship Diagram
• Rectangles represent entity sets.
• Diamonds represent relationship sets.
• Lines link attributes to entity sets and entity sets to relationship sets.
• Ellipses represent attributes
– Double ellipses represent multivalued attributes.
– Dashed ellipses denote derived attributes.
• Underline indicates primary key attributes
Summary of Symbols used in E-R Notation
Summary of Symbols (Cont.)
Roles
• Roles are indicated in E-R diagrams by labeling the lines that connect
diamonds to rectangles.
• The labels “manager” and “worker” are called roles; they specify how
employee entities interact via the works-for relationship set.
• Role labels are optional, and are used to clarify semantics of the
relationship
Ternary Relationship
• Non-binary (Ternary) relationship sets can also specify in an E-R diagram.
• Following figure consists of three entity sets employee, job and branch
related through the relationship set works-on.
Weak Entity Sets
• An entity set that does not have a primary key is referred to as a weak
entity set.
• The relationship associating the weak entity set with the identifying entity
is called identifying relationship.
• The existence of a weak entity set depends on the existence of a identifying
or owner entity set
– it must relate to the identifying entity set via a total, one-to-many
relationship set from the identifying to the weak entity set
– Identifying relationship depicted using a double diamond
• The discriminator (or partial key) of a weak entity set is the set of
attributes that distinguishes among all the entities of a weak entity set.
Weak Entity Sets (Cont.)
• Double outlined box – weak entity set.
• Double outlined diamond – identifying relationship.
• Double lines – total participation, that every payment must be related
to loan via loan-payment.
• Dashed line – discriminator (partial key).
• Weak entity set payment depends on strong entity set loan via the
relationship set loan-payment as shown in figure:
Extended E-R Features
• Extended
E-R
features
are
Specialization,
Generalization,
Aggregation, Higher and Lower Level Entity Sets, Attribute
Inheritance.
• Specialization: The process of designating sub groupings within an
entity set is called specialization. It is proceed in top-down manner
Extended E-R Features (Cont.)
• Generalization: It is the result of taking union of two or more
disjoint entity sets to produce higher level entity set. It is proceed in
bottom-up manner.
Extended E-R Features (Cont.)
• Aggregation: The process through which one can treat the relationship as
higher-level entities is known as aggregation.
• It is a feature of the E-R model that allows a relationship set to participate
in another relationship set.
Aggregation Need
• One limitation of E-R model is that it cannot express relationship among
relationship.
• To illustrate the need for such a constraint, construct ternary relationship
works-on between employee, branch and job.
• But, we cannot combine them into a single relationship, since some
employee, branch, job combinations may not have a manager.
• The best way to model a situation is to use aggregation.
Extended E-R Features (Cont.)
• Attribute Inheritance: The attributes of higher-level entity sets are said to
be inherited by lower-level entity sets, This property is known as attribute
inheritance.
• In a hierarchy, if an entity set may be involved as a lower-level entity set in
only one ISA relationship, then the entity has single inheritance.
• In a hierarchy, if an entity set may be involved as a lower-level entity set in
more than one ISA relationship, then the entity has multiple inheritance and
resulting structure is said to be a lattice.
Specialization, Generalization, Attribute Inheritance
Reduction of an E-R Schema to Tables
• Primary keys allow entity sets and relationship sets to be expressed
uniformly as tables which represent the contents of the database.
• A database which conforms to an E-R diagram can be represented by
a collection of tables.
• For each entity set and relationship set there is unique table which is
assigned the name of the corresponding entity set or relationship set.
• Each table has a number of columns (generally corresponding to
attributes), which have unique names.
• Converting an E-R diagram to a table format is the basis for deriving
a relational database design from an E-R diagram.
Representing Entity Sets as Tables
• A strong entity set reduces to a table.
University Questions on E-R Diagram
• Construct an entity relationship diagram of banking environment
where the customer borrows a loan from the bank and an account in
specific branch in a Bank. Reduce the E-R schema to a table.
• Bank Tables:
– customer (customer_id, customer_name, customer_street, customer_city)
– Loan (loan_number, amount)
– Borrower (customer_id, loan_number)
University Questions on E-R Diagram
• Construct an E-R diagram for keeping track of exploits of your
favourite sports team. You should store the matches played, the scores
in each match, the players in each match and individual player
statistics for each match. Summary should be modeled as derived
attribute.
University Questions on E-R Diagram
• Consider a database used to record the marks that students get in
different exams of different course-offerings.
(a) Construct an E-R diagram that models exams as entities, and uses
a ternary relationship for above database.
(b) Construct an alternative E-R diagram that uses only a binary
relationship between students and course-offerings. Make sure that
only one relationship exists between a particular student and course-
offering pair, yet you can represent the marks that a student gets in
different exams of a course-offering.
E-R diagram for Ternary Relationship
(a)
E-R diagram for Binary Relationship
(b)