Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
RELATIONAL DATABASE MANAGEMENT SYSTEM - I Subject code : BCA-12 Internal Assessment : 40 External Assessment : 60 1 Overview of DBMS Section A 2 Database • Related information when placed in an organized form makes a database. • Examples – Dictionary – Telephone Directory – Student record register – Address book • Two approaches for storing data in computers– File based approach – Database approach 3 Why do we need a database? • Keep records of our: – Clients – Staff – Volunteers • To keep a record of activities and interventions; • Keep sales records; • Develop reports; • Perform research File based Approach • A group of files storing data of an organization. • Each file is independent from one another. • Each file contains and processes information for one specific function, such as accounting or inventory. • Files are designed using programs written in programming languages such as COBOl,C,C++. • File systems may use a storage device such as hard disk or CD-ROM etc and involve maintaining physical location of the files. 5 Database Approach 6 • A shared collection of logically related data, designed to meet the information needs of an organization. • Database is a single, large repository of data, which can be used simultaneously by many departments and users. Characteristics of Database • Shared • Persistence • Validity/Integrity/Correctness • Security • Consistency • Non-redundancy • Data Independence Database Management System (DBMS) A software system that • Allows users to define, create and maintain a database • Provides controlled access to database • DBMS provides an environment that is both convenient and efficient to use. Examples – Computerized Library systems – Automated teller machines – Flight reservations systems 7 • Database Applications: – Banking: all transactions – Airlines: reservations, schedules – Universities: registration, grades – Sales: customers, products, purchases – Manufacturing: production, inventory, orders, supply chain – Human resources: employee records, salaries, tax deductions • Commercially available DBMS are 8 – – – – Dbase FoxPro IMS Oracle Purpose of Database System • In the early days, database applications were built on top of file systems • Drawbacks of using file systems to store data: – Data redundancy and inconsistency • Multiple file formats, duplication of information in different files – Difficulty in accessing data • Need to write a new program to carry out each new task – Data isolation — multiple files and formats – Integrity problems • Integrity constraints (e.g. account balance > 0) become part of program code • Hard to add new constraints or change existing ones 9 Purpose of Database Systems (Cont.) 10 • Drawbacks of using file systems (cont.) – Atomicity of updates • Failures may leave database in an inconsistent state with partial updates carried out • E.g. transfer of funds from one account to another should either complete or not happen at all – Concurrent access by multiple users • Concurrent accessed needed for performance • Uncontrolled concurrent accesses can lead to inconsistencies.E.g. two people reading a balance and updating it at the same time – Security problems • Database systems offer solutions to all the above problems Major Components of Database System • • • • Data Hardware (Computer System) Software (DBMS) Users (Naïve ,Online, Sophisticated users, Application Programmers, DBA) • Procedures (Instructions) DBMS DATABASE DATA AND PROGRAMS APPLICATION PROGRAMMERS 11 END USERS Advantages of DBMS • • • • • • • 12 Controlling Redundancy Integrity can be enforced Inconsistency can be avoided Data can be shared Standards can be enforced Restricting unauthorized access Providing Backup and recovery Disadvantages of DBMS • • • • • • 13 Complexity Size Performance Higher impact of failure Cost of DBMS Additional Hardware costs Architecture of DBMS • The American National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC) proposed a three layer model for DBMS. • 3 Levels of DBMS are:– External Level – Conceptual Level – Internal Level • This model was later known as ANSI SPARC architecture of DBMS. 14 Levels of DBMS • External level or User View – Highest level of DBMS – Closest to the user – Concerned with the way data is viewed by individual users – Only those portions of the database that are of concern to the user or application program are included. – Same database can have different views for different users. – Each External view is described by means of a scheme called an external Schema. – The external schema consists of the definition of logical records and the relationships in external view. • 15 Note:-A schema/scheme is an outline or plan that describes the records and relationships existing in the view. Conceptual level or Logical View • Level of indirection between external and internal level. • Describes what data are actually stored in the database and the relationship that exist among the data. • Defined by Conceptual schema which describes all records and relationships included in the database. • There is only one Conceptual Schema per database. • Defined by DBA. 16 Internal level or Physical View • Lowest level of Architecture • Closest to physical Storage i.e. how data is stored in the database. • Describes the data structures and Access methods to be used by the database. • Expressed by Internal Schema which contains the definition of the stored record, the method of representing the data fields and the access aids used. • Defined by DBA. 17 View of Data An architecture for a database system 18 3 Level ANSI Sparc Architecture of DBMS END USER EXTERNAL LEVEL END USER EXTERNAL VIEW A EXTERNAL VIEW B EXTERNAL/CONCEPTUAL MAPPING A EXTERNAL/CONCEPTUAL MAPPING B DBMS CONCEPTUAL LEVEL CONCEPTUAL VIEW CONCEPTUAL/INTERNAL MAPPING INTERNAL LEVEL 19 STORED DATABASE (INTERNAL VIEW) Mapping b/w Views • External/Conceptual Mapping – Each external schema is related to conceptual schema by external /conceptual mapping – It gives the correspondence among records and relationships of external/conceptual views. • Conceptual/Internal Mapping – Conceptual schema is related to Internal Schema by 20 Conceptual / internal mapping – This enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in conceptual schema. – Mapping b/w two levels specifies the method of deriving the conceptual record from physical database. Data Abstraction • Major purpose of DBMS is to provide an abstract view of data • i.e. system hides certain details of how data is stored and maintained. • Since many users are not computer trained complexity is hidden from them through several levels of abstraction. • 3 levels of abstraction – Internal Level or Physical Level – Conceptual Level or Logical Level – External Level or View Level 21 Levels of Abstraction • Physical level describes how a record (e.g., customer) is stored. • Logical level: describes data stored in database, and the relationships among the data. type customer = record name : string; street : string; city : integer; end; • View level: application programs hide details of data types. 22 Instances and Schemas • Schema – the logical structure of the database or overall description of the database. – e.g., the database consists of information about a set of customers and accounts and the relationship between them) – Analogous to type information of a variable in a program – Physical schema: database design at the physical level – Logical schema: database design at the logical level – Subschema: External view is defined by a subschema • Instance – the actual content of the database at a particular point in time – Analogous to the value of a variable 23 DATA INDEPENDENCE • Ability to modify schema definition in one level without effecting a schema definition in next higher level is Data Independence. • Physical Data Independence – the ability to modify the physical schema without changing the logical schema – Applications depend on the logical schema – In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. • Logical Data Independence – indicates that conceptual schema can be changed without affecting existing external schema 24 Database Administrator(DBA) • A person or group of persons responsible for overall control of database systems. • Main responsibilities of DBA are: – Deciding the information content of the database – Deciding the storage structures and access strategy – Deciding authorization checks and validation procedures – Granting of authorization for data access – Defining a strategy for backup and recovery – Monitoring performance and responding to changes in requirements. 25 Data Dictionary • A metadata (Data Dictionary) is the data about the data. • Also known as system catalog. • DD is used to control the database operation, data integrity and accuracy. • Contains information regarding database • Provides the name of the data element, its description and data structure in which it may be found. • Provides great assistance in producing a report of where a data element is used in all programs that mention it. 26 Database Languages • Data Definition Languages (DDL) – It is a language that allows the users to define data and their relationship to other types of data. – It is mainly used to create files, databases , data dictionary and tables within databases. • Data Manipulation Languages (DML) – It is a language that provides a set of operations to support the – • • • • basic data manipulation operations on the data held in the databases. By manipulation we mean: to retrieve information stored in the database To insert new information in the database To delete unwanted information from the database To modify existing data in the database • Data Control Language (DCL) – DCL helps to control access to data and the database using 27 statements grant and revoke Types of Database Systems • On the basis of no. of users – Single user DBMS – Multi user DBMS • On the basis of Site Location – Centralized DBMS – Parallel DBMS – Distributed DBMS – Client/Server DBMS 28 Data Models • A model is a representation of reality,' real world’ objects and events, and their associations. • An integrated collection of concepts for describing and manipulating – data – data relationships – data semantics – data constraints • A data model comprises of 3 components – A structural part – A manipulative part – A set of integrity rules 29 Types of Data Models • Basically data models fall into 3 broad categories – Object based Data Models • Entity-Relationship model • Object Oriented Model – Physical Data Model – Record based Data Model • Hierarchical Model • Network Model • Relational Model 30 Entity-Relationship Model Example of schema in the entity-relationship model 31 Entity Relationship Model (Cont.) • E-R model of real world – Entities (objects) • E.g. customers, accounts, bank branch – Relationships between entities • E.g. Account A-101 is held by customer Johnson • Relationship set depositor associates customers with accounts • Widely used for database design – Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing 32 Relational Model Attributes • Example of tabular data in the relational model Customer -id customername 192-83-7465 Johnson South Palo Alto A-101 019-28-3746 Smith North Rye A-215 192-83-7465 Johnson South Palo Alto A-201 321-12-3123 Jones West Harrison A-217 019-28-3746 Smith North Rye A-201 33 customerstreet customercity accountnumber A Sample Relational Database 34 Data Definition Language (DDL) • Specification notation for defining the database schema – E.g. create table account ( account-number char(10), balance integer) • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata (i.e., data about data) – database schema – Data storage and definition language • language in which the storage structure and access methods used by the database system are specified • Usually an extension of the data definition language 35 Data Manipulation Language (DML) • Language for accessing and manipulating the data organized by the appropriate data model – DML also known as query language • Two classes of languages – Procedural – user specifies what data is required and how to get those data – Nonprocedural – user specifies what data is required without specifying how to get those data • SQL is the most widely used query language 36 DATA MODELS SECTION B ENTITY RELATIONSHIP MODEL Entity-Relationship Model • • • • • • • • • Entity Sets Relationship Sets Design Issues Mapping Constraints Keys E-R Diagram Extended E-R Features Design of an E-R Database Schema Reduction of an E-R Schema to Tables Entity Sets • A database can be modeled as: – a collection of entities, – relationship among entities. • An entity is an object that exists and is distinguishable from other objects. – Example: specific person, company, event, plant • Entities have attributes – Example: people have names and addresses • An entity set is a set of entities of the same type that share the same properties. – Example: set of all persons, companies, trees, holidays Entity Sets customer and loan Custid CustCustname street Custcity loan- amt number Attributes • An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an Example: entity set. customer = (customer-id, customer-name, customer-street, customer-city) loan = (loan-number, amount) • Domain – the set of permitted values for each attribute • Attribute types: – Simple and composite attributes. – Single-valued and multi-valued attributes • E.g. multivalued attribute: phone-numbers – Derived attributes • Can be computed from other attributes • E.g. age, given date of birth Composite Attributes Relationship Sets • A relationship is an association among several entities Example: Hayes depositor A-102 customer entity relationship set account entity • A relationship set is a mathematical relation among n 2 entities, each taken from entity sets {(e1, e2, … en) | e1 E1, e2 E2, …, en En} where (e1, e2, …, en) is a relationship – Example: (Hayes, A-102) depositor Relationship Set borrower Relationship Sets (Cont.) • An attribute can also be property of a relationship set. • For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date Degree of a Relationship Set • Refers to number of entity sets that participate in a relationship set. • Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary. • Relationship sets may involve more than two entity sets. E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch • Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.) Mapping Cardinalities • Express the number of entities to which another entity can be associated via a relationship set. • Most useful in describing binary relationship sets. • For a binary relationship set the mapping cardinality must be one of the following types: – One to one – One to many – Many to one – Many to many Mapping Cardinalities One to one One to many Note: Some elements in A and B may not be mapped to any elements in the other set Mapping Cardinalities Many to one Many to many Note: Some elements in A and B may not be mapped to any elements in the other set Mapping Cardinalities affect ER Design Can make access-date an attribute of account, instead of a relationship attribute, if each account can have only one customer i.e., the relationship from account to customer is many to one, or equivalently, customer to account is one to many E-R Diagrams Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes Double ellipses represent multivalve attributes. Dashed ellipses denote derived attributes. Underline indicates primary key attributes E-R Diagram With Composite, Multivalued, and Derived Attributes Relationship Sets with Attributes Roles • Entity sets of a relationship need not be distinct • The labels “manager” and “worker” are called roles; they specify how employee entities interact via the works-for relationship set. • Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles. • Role labels are optional, and are used to clarify semantics of the relationship Cardinality Constraints • We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set. • E.g.: One-to-one relationship: – A customer is associated with at most one loan via the relationship borrower – A loan is associated with at most one customer via borrower One-To-Many Relationship • In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower Many-To-One Relationships • In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower Many-To-Many Relationship • A customer is associated with several (possibly 0) loans via borrower • A loan is associated with several (possibly 0) customers via borrower Participation of an Entity Set in a Relationship Set Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total every loan must have a customer associated to it via borrower Partial participation: some entities may not participate in any relationship in the relationship set E.g. participation of customer in borrower is partial Contd.. Alternative Notation for Cardinality Limits Cardinality limits can also express participation constraints Keys • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. • A candidate key of an entity set is a minimal super key – Customer-id is candidate key of customer – account-number is candidate key of account • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. Keys for Relationship Sets • The combination of primary keys of the participating entity sets forms a super key of a relationship set. – (customer-id, account-number) is the super key of depositor – NOTE: this means a pair of entity sets can have at most one relationship in a particular relationship set. • E.g. if we wish to track all access-dates to each account by each customer, we cannot assume a relationship for each access. We can use a multivalued attribute though • Must consider the mapping cardinality of the relationship set when deciding the what are the candidate keys • Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key E-R Diagram with a Ternary Relationship Cardinality Constraints on Ternary Relationship • We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint • E.g. an arrow from works-on to job indicates each employee works on at most one job at any branch. • If there is more than one arrow, there are two ways of defining the meaning. – E.g a ternary relationship R between A, B and C with arrows to B and C could mean – 1. each A entity is associated with a unique entity from B and C or – 2. each pair of entities from (A, B) is associated with a unique C entity, and each pair (A, C) is associated with a unique B – Each alternative has been used in different formalisms – To avoid confusion we outlaw more than one arrow Design Issues • Use of entity sets vs. attributes Choice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question. • Use of entity sets vs. relationship sets Possible guideline is to designate a relationship set to describe an action that occurs between entities • Binary versus n-ary relationship sets Although it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship. • Placement of relationship attributes How about doing an ER design interactively on the board? Suggest an application to be modeled. Weak Entity Sets • An entity set that does not have a primary key is referred to as a weak entity set. • The existence of a weak entity set depends on the existence of a identifying entity set – it must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set – Identifying relationship depicted using a double diamond • The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. • The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator. Weak Entity Sets (Cont.) • We depict a weak entity set by double rectangles. • We underline the discriminator of a weak entity set with a dashed line. • payment-number – discriminator of the payment entity set • Primary key for payment – (loan-number, payment-number) Weak Entity Sets (Cont.) • Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the identifying relationship. • If loan-number were explicitly stored, payment could be made a strong entity, but then the relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute loannumber common to payment and loan More Weak Entity Set Examples • In a university, a course is a strong entity and a courseoffering can be modeled as a weak entity • The discriminator of course-offering would be semester (including year) and section-number (if there is more than one section) • If we model course-offering as a strong entity we would model course-number as an attribute. Then the relationship with course would be implicit in the course-number attribute Specialization • Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set. • These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set. • Depicted by a triangle component labeled ISA (E.g. customer “is a” person). • Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked. Specialization Example Generalization • A bottom-up design process – combine a number of entity sets that share the same features into a higherlevel entity set. • Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way. • The terms specialization and generalization are used interchangeably. Specialization and Generalization (Contd.) • Can have multiple specializations of an entity set based on different features. • E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary vs. teller • Each particular employee would be – a member of one of permanent-employee or temporary-employee, – and also a member of one of officer, secretary, or teller • The ISA relationship also referred to as superclass subclass relationship Design Constraints on a Specialization/Generalization • Constraint on which entities can be members of a given lower-level entity set. – condition-defined • E.g. all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person. – user-defined • Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization. – Disjoint • an entity can belong to only one lower-level entity set • Noted in E-R diagram by writing disjoint next to the ISA triangle – Overlapping • an entity can belong to more than one lower-level entity set Design Constraints on a Specialization/Generalization (Contd.) • Completeness constraint -- specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization. – total : an entity must belong to one of the lower-level entity sets – partial: an entity need not belong to one of the lowerlevel entity sets Aggregation Consider the ternary relationship works-on, which we saw earlier Suppose we want to record managers for tasks performed by an employee at a branch Aggregation (Cont.) • Relationship sets works-on and manages represent overlapping information – Every manages relationship corresponds to a works-on relationship – However, some works-on relationships may not correspond to any manages relationships • So we can’t discard the works-on relationship • Eliminate this redundancy via aggregation – Treat relationship as an abstract entity – Allows relationships between relationships – Abstraction of relationship into new entity • Without introducing redundancy, the following diagram represents: – An employee works on a particular job at a particular branch – An employee, branch, job combination may have an associated manager E-R Diagram With Aggregation E-R Design Decisions • The use of an attribute or entity set to represent an object. • Whether a real-world concept is best expressed by an entity set or a relationship set. • The use of a ternary relationship versus a pair of binary relationships. • The use of a strong or weak entity set. • The use of specialization/generalization – contributes to modularity in the design. • The use of aggregation – can treat the aggregate entity set as a single unit without concern for the details of its internal structure. E-R Diagram for a Banking Enterprise How about doing another ER design interactively on the board? Summary of Symbols Used in E-R Notation Summary of Symbols (Cont.) Alternative E-R Notations Reduction of an E-R Schema to Tables • Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database. • A database which conforms to an E-R diagram can be represented by a collection of tables. • For each entity set and relationship set there is a unique table which is assigned the name of the corresponding entity set or relationship set. • Each table has a number of columns (generally corresponding to attributes), which have unique names. • Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram. Representing Entity Sets as Tables • A strong entity set reduces to a table with the same attributes. Composite and Multivalve Attributes • Composite attributes are flattened out by creating a separate attribute for each component attribute – E.g. given entity set customer with composite attribute name with component attributes first-name and last-name the table corresponding to the entity set has two attributes name.first-name and name.last-name • A multivalued attribute M of an entity E is represented by a separate table EM – Table EM has attributes corresponding to the primary key of E and an attribute corresponding to multivalued attribute M – E.g. Multivalued attribute dependent-names of employee is represented by a table employee-dependent-names( employee-id, dname) Composite and Multivalued Attributes(Cont.) – Each value of the multivalued attribute maps to a separate row of the table EM • E.g., an employee entity with primary key John and dependents Johnson and Johndotir maps to two rows: (John, Johnson) and (John, Johndotir) Representing Weak Entity Sets A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set Representing Relationship Sets as Tables • A many-to-many relationship set is represented as a table with columns for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. • E.g.: table for relationship set borrower Redundancy of Tables Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the many side, containing the primary key of the one side E.g.: Instead of creating a table for relationship account-branch, add an attribute branch to the entity set account Redundancy of Tables (Cont.) • For one-to-one relationship sets, either side can be chosen to act as the “many” side – That is, extra attribute can be added to either of the tables corresponding to the two entity sets • If participation is partial on the many side, replacing a table by an extra attribute in the relation corresponding to the “many” side could result in null values • The table corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant. – E.g. The payment table already contains the information that would appear in the loan-payment table (i.e., the columns loan-number and paymentnumber). Hierarchical Data Model GRAND PARENTS PARENTS CHILDREN PARENTS CHILDREN PARENTS CHILDREN CHILDREN Hierarchical Data Model • Developed in late 1950s by North American Rockwell Company and IBM. • Tree like structure with records forming nodes and fields forming branches of the tree. • Organizes data elements as tabular rows, one for each instance of entity. • Example :Consider a company’s organizational structure GENERAL MANAGER SALES DEPT EMP1 EMP2 DEPUTY GM DEPUTY GM ADVT DEPT PROD DEPT EMP1 DEPUTY GM INVENTORY DEPT EMP1 EMP1 EMP2 EMP2 Characteristics of Hierarchical Model • Relationship is represented as a parent-child . • Each Hierarchical tree can have one root record type and this root record type does not have a parent record type. • Root can have any no. of child record types and each of which can itself be a root of hierarchical subtree. • Each child record type can have only one parent record type. Thus a many to many relationship cannot be directly expressed b/w two record types. • Data in a parent record type applies to all its children records. • A child record occurrence must have a parent record occurrence. Deleting a parent record occurrence requires deleting all its children record occurrences. Hierarchical View for Database • Consider a Supplier-Parts Database • The records for suppliers, parts and shipments are given in following tables. Sno Name Status City S1 Suneet 20 Mohali S2 Ankit 10 Amritsar S3 Amit 10 Amritsar Supplier table Pno Name Color City P1 Nut Red Mohali P2 Bolt Green Amritsar P3 Screw Blue Jalandhar P4 Screw Red Mohali Parts Table Hierarchical Database Shipment Table Sno Pno Qty S1 P1 250 S1 P2 300 S1 P3 500 S2 P1 250 S2 P2 500 S3 P2 300 P3 Screw Blue S1 Suneet 20 P2 Screw Red Mohali Green 500 Amritsar S1 Suneet 20 Mohali S2 Ankit 10 Amritsar 500 S3 Amit 10 Amritsar 300 P1 P4 Bolt Jalandhar Nut Red 300 Mohali Mohali S1 Suneet 20 Mohali 250 S2 Ankit 10 Amritsar 250 Operations on Hierarchical Model • Four basic operations that can be performed on Hierarchical model are – Insert Operation – Update Operation – Delete Operation – Retrieve Operation Advantages • Advantages – Simplicity – Data Security – Data Integrity – Efficiency Disadvantages – Implementation Complexity – Database Management problems – Lack of Structural Independence – Program Complexity – Operational Anomalies – Implementation Limitation Network Database Model • Network Model replaces the hierarchical tree with a graph. • Thus allowing a more general connections among the nodes. • Relationship is represented as owner-member type. • Main difference of Network model from Hierarchical model is its ability to handle Many to Many relationships (M:N). • In other words ,it allows a record to have more than one owner. Terminology of Network Model • A relationship is a set • Each set is made up of two types of records: an owner record and a member record. • An owner can have any no. of member records . • Each member can have more than one owner record types. • Thus 1:1,1:M,M:N relationship among entities is allowed. • Member and owner records are connected with each other with a third record type known as connector. • A connector occurrence specifies the association b/w member and owner records. Network view of Database S1 Suneet 250 P1 Nut Red Mohali 20 Mohali 300 S2 Ankit 500 P2 Bolt Green Amritsar 10 250 Amritsar S3 500 P3 Screw Blue Jalandhar Amit 10 Amritsar 300 P4 Screw Red Mohali Operations on Network Model • • • • Insert operation Update operation Delete operation Retrieve operation Advantages & Disadvantages • Advantages – Conceptual Simplicity – – – – – • Capability to handle more relationship types Ease of data access Data Integrity Data Independence Database standards Disadvantages – System Complexity – Operational anomalies – Absence of structural Independence Relational Model • • • • • Structure of Relational Databases Relational Algebra Tuple Relational Calculus Domain Relational Calculus Extended Relational-AlgebraOperations • Modification of the Database • Views Relational Model • Concept was proposed by Dr. E.F. Codd, in 1960s. • Data is stored in the form of tables. • Relational model consists of 3 major components:– Set of relations and set of domains (Data Structure) – Integrity Rules (Data Integrity) – Operations (Data Manipulation) • Characteristics of Relational Database – Whole data is conceptually represented as an orderly arrangement of data into rows and columns called a relation or table. – All values are scalar. – All operations are performed on an entire relation and result is an entire relation. Basic terminology used in Relational Model • Tuple- Each row of data is a tuple. Actually each row is an n-tuple (n=1,2,3……). • Cardinality of a relation -No. of tuples in a relation is cardinality. • Attributes-Each column in a relation is an attribute. • Degree of a relation-No. of attributes in a relation determines its degree. • Domain- Set of all possible values that an attribute may contain. • Body of a relation-The body of a relation consists of unordered set of zero or more tuples. Keys of a Relation • Primary key- Is an attribute which possess the property of uniqueness and irreducibility. • Foreign key-is an attribute or set of attributes which helps to establish relationship b/w two tables. Operations in Relational Model – Insert operation – Update operation – Delete operation – Retrieve operation Advantages & Disadvantages • Advantages – Structural Independence – Conceptual Simplicity – Design, Implementation, maintenance and usage ease – Adhoc query capability • Disadvantages – Hardware overheads – Ease of design lead to bad design Example of a Relation RDBMS • Concept was given by Dr. E.F.Codd. • In RDBMS, data is stored in relations i.e. in the form of tables. • Relational Database Management System Components– Data Structure – Data Integrity – Data Manipulation • Relational data structure – – – – – – Relation Attribute Domain Tuple Cardinality Degree • Extension of a relation – The set of tuples appearing in that relation at any given instant of time. – Extension varies with time. – It changes as tuples are created, destroyed and updated. – It is same as view of a table. • Intension of a relation – It is permanent part of a relation and independent of time. – Intension is combination of two things • Naming structure – consists of relation name ,name of attributes. • Integrity constraints- key constraints, referential constraints etc. Data Integrity • Relational Keys – What is a key??? • Key is a set of one or more columns whose combined values are unique among all occurrences in a given table. • Types of Keys – Primary key – Foreign key – Super key – Candidate key – Alternate key Integrity Rules • There are two important rules which are constraints or restrictions that apply to all instances of the database. • Two integrity rules are – Entity Integrity Rule – Referential Integrity Rule • Entity Integrity Rule – It states that in a base relation, value of attribute of a primary key cannot be null. • Referential Integrity Rule – It states that if a foreign key exists in a relation, either the foreign key value must match values in primary key or the foreign key value must be null. Relational Data Manipulation • The manipulative part of the relational model consists of a set of operators known as Relational Algebra and Relational Calculus. • Relational Algebra is a procedural language that can be used to tell DBMS how to build a new relation from one or more relations in the database. • Relational Calculus is a non procedural language that can be used to formulate the definition of a relation in terms of one or more database relations. • Note :- Relational Algebra and Calculus will be discussed in detail later. CODD’s Rules • • • • Dr. E.F.Codd, founder of relational database systems, framed 12 rules to assist a database to qualify as relational. An RDBMS has to satisfy at least six of the 12 rules to be accepted as full fledged RDBMS. There is no RDBMS package commercially available that satisfies all the 12 rules. These rules are1. Information Rule:- All information in a relational database is represented in the form of tables. It speeds up design and learning process. User productivity is improved since knowledge of only one language is necessary to access all data. 2. Guaranteed Access Rule:- Every piece of data in a relational database, can be accessed using a primary key value that identifies the row & column. 3. Comprehensive Data Sub-language Rule:- The RDBMS may support several languages. But at least one of them should allow the user to do all the following: define tables and views, query and update data, set integrity constraints, set authorizations etc. 4. View updating Rule:-Any view that can be updated theoretically can be updated using RDBMS. Data consistency is ensured since the changes made in the view are transmitted in the base table and vice-versa. 5. High level insert, Update and delete:-RDBMS supports insertion, deletion and updation at table level. 6. Physical Data Independence 7. Logical Data Independence 8. Integrity Independence:-Like table definitions, integrity constraints are stored in the on line catalog or data dictionary and can therefore be changed without necessitating changes in the application programs. 9. Non subversion rule:- If RDBMS has a language that accesses information of a record at a time, then that language should not be used to bypass the integrity constraints. 10. Systematic treatment of Null values:-In RDBMS, null values should be supported for representation of missing or inapplicable information. 11. Database description rule:- The description of a database is stored and maintained in the form of tables. This implies that Data dictionary must be present within the RDBMS. 12. Distributed Independence:- The RDBMS package must have distribution independence. It helps to distribute database across multiple computers even though they are having heterogeneous platforms both for hardware and operating system. This is the most attractive aspects of RDBMS. Basic Structure • Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn Thus a relation is a set of n-tuples (a1, a2, …, an) where each ai Di • Example: if customer-name = {Jones, Smith, Curry, Lindsay} customer-street = {Main, North, Park} customer-city = {Harrison, Rye, Pittsfield} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield)} is a relation over customer-name x customer-street x customer-city Relation Schema • A1, A2, …, An are attributes • R = (A1, A2, …, An ) is a relation schema E.g. Customer-schema = (customer-name, customer-street, customer-city) • r(R) is a relation on the relation schema R E.g. customer (Customer-schema) Relation Instance • The current values (relation instance) of a relation are specified by a table • An element t of r is a tuple, represented by a row in a table attributes (or columns) customer-name customer-street Jones Smith Curry Lindsay Main North North Park customer customer-city Harrison Rye Rye Pittsfield tuples (or rows) Relations are Unordered Order of tuples is irrelevant (tuples may be stored in an arbitrary order) E.g. account relation with unordered tuples Query Languages • Language in which user requests information from the database. • Categories of languages – procedural – non-procedural • “Pure” languages: – Relational Algebra – Tuple Relational Calculus – Domain Relational Calculus • Pure languages form underlying basis of query languages that people use. Relational Algebra • Procedural language • User has to specify what is required and what are the steps to obtain the required output. • The operators take one or more relations as inputs and give a new relation as a result. • Two types of relational operators– Traditional Set operators – Special operators • Formal and non-user friendly language. Traditional Set Operators • Traditional set operators are:– – – – Union Intersection Difference Cartesian Product • Union:-A union of two relations is a relation that contain all data elements belonging to both relations while writing the duplicate values only once. – In order to perform the union operation , both operand relations must be union compatible i.e. they must have same no. of columns drawn from same domain (i.e. must be of same data type). – It is represented by symbol ‘U’. Union Cust_name Cust_status Cust_name Cust_status Sham Good Karan Bad Rahul Excellent Sham Good S R Cust_name Cust_status Sham Good Rahul Excellent Karan Bad RUS Intersection • Intersection of two relations consists of a third relation with all rows that are common to both relations. • It is represented by symbol ‘∩’. • Taking above example, R ∩ S is Cust_name Cust_status Sham Good R∩S Difference • Difference of two relations consist of a third relation which consists of elements which are present in one relation and not in another. • It is represented using symbol ‘-’. • Minus is not associative. Cust_name Cust_status Cust_name Cust_status Rahul Excellent Karan Bad R-S S-R Cartesian Product • Cartesian product of two relations include each row of one relation is associated with each row of another relation. • It is denoted by cross (X). A B C D A B C D α 1 α 10 α 1 α 10 β 2 β 20 α 1 β 20 α 11 α 1 α 11 α 24 α 1 α 24 β 2 α 10 β 2 β 20 β 2 α 11 β 2 α 24 R S RXS Special Relational Operations • Four special relational operators are:– Selection – Projection – Join – Division • Selection:- Yields a horizontal subset of a given relation. • Those tuples or rows of a table are selected which satisfy a particular condition (predicate). • It is represented by Greek letter sigma ‘σ’. • Notation: p (r) • p is called the selection predicate. Selection • E.g. Suppose from supplier table, we have to select values where city is ‘Mohali’. Sno Sname Status City S1 Suneet 20 Mohali S2 Ankit 10 Amritsar S3 Amit 30 Amritsar S4 Raj 20 Chandigarh The query is like • σcity= ‘Mohali’ (Supplier) Another example • branch-name=“Perryridge” (account) Projection • Projection operation on a table simply forms another table by copying specified columns from original table while eliminating any duplicated rows. • Projection is denoted by ‘π’ • Consider the following employee table. Empno Ename Age Sal E01 Smith 25 7500 E02 John 36 10000 E03 Allen 42 10000 • Projections of Age Age 25 36 42 πAge (employee) Project Operation – Example • Relation r: A A,C B C 10 1 20 1 30 1 40 2 (r) A C A 1 1 1 1 2 C = 1 2 Project Operation • Notation: A1, A2, …, Ak (r) where A1, A2 are attribute names and r is a relation name. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed • Duplicate rows removed from result, since relations are sets Natural-Join Operation Notation: r s • Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as follows: – Consider each pair of tuples tr from r and ts from s. – If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result, where • t has the same value as tr on r • t has the same value as ts on s • Example: R = (A, B, C, D) S = (E, B, D) – Result schema = (A, B, C, D, E) – r s is defined as: r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s)) Natural Join Operation – Example • Relations r, s: A B C 1 2 4 1 2 D B D a a b a b 1 3 1 2 3 a a a b b r r s s A B 1 1 1 1 2 C D a a a a b E E Division Operation • Is used to divide a relation A of degree m+n (no. of columns in a relation) by relation B of degree n, produces a relation of degree m. • It is denoted by ‘ ‘. • E.g. A Sno Pno S1 P1 S1 P2 S2 P1 S3 P2 S3 p3 A Pn o Pno P1 P2 P3 II I B Sno S1 S2 B Sno S3 II I Division Operation – Example Relations r, s: A r s: A r B B 1 2 3 1 1 1 3 4 6 1 2 1 2 s Another Division Example Relations r, s: A B C a a a a a a a a D E D E a a b a b a b b 1 1 1 1 3 1 1 1 a b 1 1 r r s: A B a a C s Assignment Operation • The result to the right of the is assigned to the relation variable on the left of the . • AB • Value of B is assigned to A. Aggregate Functions and Operations • Aggregation function takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values • Aggregate operation in relational algebra G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E) – E is any relational-algebra expression – G1, G2 …, Gn is a list of attributes on which to group (can be empty) – Each Fi is an aggregate function – Each Ai is an attribute name Aggregate Operation – Example • Relation r: A B C 7 7 3 10 g sum(c) (r) sum-C 27 Aggregate Operation – Example • Relation account grouped by branch-name: branch-name account-number Perryridge Perryridge Brighton Brighton Redwood branch-name g balance A-102 A-201 A-217 A-215 A-222 sum(balance) 400 900 750 750 700 (account) branch-name Perryridge Brighton Redwood balance 1300 1500 700 Aggregate Functions (Cont.) • Result of aggregation does not have a name – Can use rename operation to give it a name – For convenience, we permit renaming as part of aggregate operation branch-name g sum(balance) as sum-balance (account) Outer Join • An extension of the join operation that avoids loss of information. • Computes the join and then adds tuples form one relation that do not match tuples in the other relation to the result of the join. • Uses null values: – null signifies that the value is unknown or does not exist – All comparisons involving null are (roughly speaking) false by definition. • Will study precise meaning of comparisons with nulls later Outer Join – Example • Relation loan loan-number branch-name L-170 L-230 L-260 Downtown Redwood Perryridge amount 3000 4000 1700 Relation borrower customer-name loan-number Jones Smith Hayes L-170 L-230 L-155 Outer Join – Example • Inner Join loan Borrower loan-number branch-name L-170 L-230 Downtown Redwood amount customer-name 3000 4000 Jones Smith amount customer-name Left Outer Join loan loan-number L-170 L-230 L-260 Borrower branch-name Downtown Redwood Perryridge 3000 4000 1700 Jones Smith null Outer Join – Example • Right Outer Join loan borrower loan-number L-170 L-230 L-155 branch-name Downtown Redwood null amount 3000 4000 null customer-name Jones Smith Hayes Full Outer Join loan borrower loan-number L-170 L-230 L-260 L-155 branch-name Downtown Redwood Perryridge null amount 3000 4000 1700 null customer-name Jones Smith null Hayes Null Values • It is possible for tuples to have a null value, denoted by null, for some of their attributes • null signifies an unknown value or that a value does not exist. • The result of any arithmetic expression involving null is null. • Aggregate functions simply ignore null values – Is an arbitrary decision. Could have returned null as result instead. – We follow the semantics of SQL in its handling of null values • For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same – Alternative: assume each null is different from each other – Both are arbitrary decisions, so we simply follow SQL Modification of the Database • The content of the database may be modified using the following operations: – Deletion – Insertion – Updating • All these operations are expressed using the assignment operator. Deletion • A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database. • Can delete only whole tuples; cannot delete values on only particular attributes • A deletion is expressed in relational algebra by: rr–E where r is a relation and E is a relational algebra query. Deletion Examples • Delete all account records in the Perryridge branch. account account – branch-name = “Perryridge” (account) Delete all loan records with amount in the range of 0 to 50 loan loan – amount 0 and amount 50 (loan) Insertion • To insert data into a relation, we either: – specify a tuple to be inserted – write a query whose result is a set of tuples to be inserted • in relational algebra, an insertion is expressed by: r r E where r is a relation and E is a relational algebra expression. • The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple. Insertion Example • Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch. account account {(“Perryridge”, A-973, 1200)} depositor depositor {(“Smith”, A-973)} Updating • A mechanism to change a value in a tuple without changing all values in the tuple • Use the generalized projection operator to do this task r F1, F2, …, FI, (r) • Each Fi is either – the ith attribute of r, if the ith attribute is not updated, or, – if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute Update Examples • Make interest payments by increasing all balances by 5 percent. account AN, BN, BAL * 1.05 (account) where AN, BN and BAL stand for account-number, branch-name and balance, respectively. Pay all accounts with balances over $10,000 6 percent interest and pay all others 5 percent account AN, BN, BAL * 1.06 ( AN, BN, BAL * 1.05 ( BAL BAL 10000 (account)) 10000 (account)) Relational Calculus Tuple oriented relational calculus Domain Oriented relational calculus Relational Calculus • Non procedural language • User is not concerned with the procedure to obtain the results, he/she just tell the results and output is available without knowing the method of retrieval. • A query is expressed as a formula consisting of a no. of variables and an expression involving these variables. • It is up to DBMS to transform these non procedural queries into procedural queries. • This concept was first proposed by Codd. • It is based on predicate calculus. • Propositions specifying a property consist of an expression that names an individual object and another expression called the predicate stand for the property that object possesses. • E.g. MCA is a class Object –MCA, India India is a country. Predicate- Class, Country Contd.. • A convenient method of writing the statements is to place the predicate first and follow it with the object enclosed in parenthesis. • E.g. MCA is a class can be written as is a class (MCA) or Class (MCA) • Finally, if we use symbols for both the predicate and the object, we can rewrite the statement as P(X) where P is the predicate and X is the object. • These are known as one place predicate. • Two place predicate-require two objects. • E.g. Naresh is taller than Rajesh ABC is intelligent than XYZ These can be written as TALLER_THAN (Naresh, Rajesh) INTELLIGENT_THAN (ABC,XYZ) Atomic Formula • A predicate followed by its arguments is called an Atomic Formula. • E.g. CLASS(MCA) TALLER_THAN (Naresh ,Rajesh) • Atomic formulas can also be combined using logical connectors such as NOT or Negation ┐ OR V AND ^ IMPLICATION → • E.g. oracle is a DBMS and ABC is a company Can be written in atomic formula as DBMS (oracle) ^ COMPANY(ABC) Tuple Oriented Relational Calculus • Based on specifying a number of tuple variables. • A nonprocedural query language, where each query is of the form {t | P (t) } • It is the set of all tuples t such that predicate P is true for t • t is a tuple variable, t[A] denotes the value of tuple t on attribute A • t r denotes that tuple t is in relation r • P is a formula similar to that of the predicate calculus. • E.g. {t | Book (t) and t. Price>100} Will get you all books whose price is greater than 100.It retrieves attributes for each selected value. • To retrieve only some of the attributes say TITLE,AUTHOR,PRICE, Query is like { t.TITLE, t. AUTHOR, t. PRICE |BOOK (t) and t. PRICE>200} CONTD.. • Thus in a tuple calculus, we need – A condition to select the required tuples from the relation. – A set of attributes to be retrieved. • WFF (Well Formed Formula) Every condition is a WFF.A WFF is constructed from conditions, Boolean operations (AND,OR,NOT) and quantifiers like for all () and there exists(). t r (Q(t)) ”there exists” a tuple in t in relation r such that predicate Q(t) is true t r (Q(t)) Q is true “for all” tuples t in relation r Free and Bound variables • Bound Variables:- Bound variables are those range variables when meaning of the formula remain unchanged if all the occurrence of range variables say ‘x’ were replaced by some other variable say ‘y’. Then x is called bound variable. • E.g. x (x>3) means exists x (x>3) • If x is replaced by ‘y’ exists y (y>3) Meaning of the expression has not changed. So, x is a bound variable. • Free Variables:- Free variables are those range variables if all the occurrences of range variable say ‘x’ were replaced by some other variable say ‘y’, meaning of the formula is changed . • E.g. exists x (x>3) and y<0 • If x is replaced by ‘y’ Exists y (y>3) and y<0 Meaning of the expression is totally changed. Open and Closed WFF • Closed WFF:- A WFF in which all variables are bound is Closed WFF. • E.g. x (x>3) is closed WFF • Open WFF:- A WFF in which at least one variable is free variable is Open WFF. • E.g. x (x>3) and y<0 is open WFF Domain Oriented Relational Calculus • In Domain Calculus the variables range over single values from domains of attributes rather than ranging over tuples. • An expression of domain calculus is – { x1, x2, …, xn | P(x1, x2, …, xn)} – x1, x2, …, xn represent domain variables – P represents a formula similar to that of the predicate calculus • Free and bound Variables-(rules same as tuple calculus) • E.g. Get emp no in relation emp EX where emp (empno: EX) • Get emp no for job ’Clerk’ EX where emp (empno: Ex, job=‘Clerk’) Comparison of Relational Algebra and Relational Calculus RELATIONAL ALGEBRA RELATIONAL CALCULUS 1. It is a procedural method of 1. It is a non procedural method solving the queries. of solving the queries. 2. The solution is obtained by 2. The solution is obtained by stating what is required and what are the steps to obtain that information. stating what is required and letting the system find the answer. 3. It is used as a vehicle for 3. Relational Calculus queries implementation Relational Calculus. 4. Two of types-Traditional set operators, Special operators. are converted into equivalent relational algebra format using codd’s Reduction algorithm and than it is implemented using relational operators. 4. Two types- Tuple oriented , Domain oriented. Relational Database Design Normalization Normalization • A design technique used to design Relational Model. • Can be defined as a process during which redundant relation schemas are decomposed by breaking up their attributes into smaller relation schemas that possess desirable properties. • Basic objective of normalization is to reduce redundancy. • Storing information several times leads to – – Wastage of storage space – High cost – Inconsistencies Properties of a Normalized Relation • No data values should be duplicated in different rows unnecessarily. • A value must be specified for every attribute in a row. • Each relation should be self- contained. In other words, if a row is deleted, important information should not be accidentally lost. • When a row is added to a relation, other relations in a database should not be affected. • A value of an attribute in a tuple may be changed independent of other tuples in the relation and other relations. Types of Normal Forms • • • • • • First Normal Form(1NF) Second Normal Form(2NF) Third Normal Form(3NF) Boyce- Codd Normal Form (BCNF) Fourth Normal Form(4NF) Fifth Normal Form(5NF) 5NF 4NF BCNF 3NF 2NF 1NF Dependencies • Functional Dependency (FD):- An association between two attributes of the same relational database. • One attribute is called determinant and the other is called determined. • For each value of the determinant there is associated one and only one value of the determined. • If A is the determinant and B is determined ,then we can say that A functionally determines B and is represented as A → B. For each value of A ,there is only one value of B A B 1 1 2 4 3 9 4 16 Fully Functional Dependency (FFD) • Is defined as attribute Y is FFD on attribute X if it is FD on X and not FD on any proper subset of X. • E.g. (A,B) → C X Y • C depends on both A & B and not on A or B. Multivalued Dependency(MVD) • MVD is when one attribute value is potentially a multi valued fact about another. • If a relation R having A,B,C as attributes, B and C are multivalued facts about A which is represented as A→→B and A →→C, then multivalued dependency exist only if B and C are independent of each other. Transitive Dependency • Assume that A,B,C are the attributes of a relation R. If in a relation A→B B→C Then A → C • This is transitive dependency First Normal Form • A relation is in 1NF if and only if all underlying domains contain atomic values or single value only. Course code Course name Teacher name Roll no Name System _used Hourly _rate Total_ hrs C1 Visual Basic ABC 100 A1 P-1 20 7 101 A2 P-II 30 3 Oracle DEF 100 A1 P-I 20 7 104 A7 P-III 37 3 106 A7 P-IV 40 3 C2 C3 C++ KJP Table -1 Unnormalized Relation 1NF • Two approaches used to convert a relation to 1NF– Flattening the table – Decomposition of the table • Flattening the table:- It removes repeating groups by filling in the “missing” entries of each incomplete row of the table with copies of their non – repeating attributes. Course code Course name Teacher name Roll no Name System_us ed Hourly_ rate Total_h rs C1 Visual Basic ABC 100 A1 P-1 20 7 C1 Visual Basic ABC 101 A2 P-II 30 3 C2 Oracle DEF 100 A1 P-I 20 7 C2 Oracle DEF 104 A7 P-III 37 3 C3 C++ KJP 106 A7 P-IV 40 3 Table 2 Normalized relation 1NF (Contd…) 2. Decomposition of the table:- In this original table is decomposed into two new tables. • Involves separating the attributes of the relation to create the schemas of two new relations. • Before decomposing the original table, it is necessary to identify an attribute or set of attributes that can be used as table identifier. • Rule of decomposition – One of the two tables contains the table identifier of the original table and all the non- repeating attributes, – The other table contains a copy of the table identifier and all the repeating attributes. 1NF (Contd…) Cours e code Course name C1 Visual Basic ABC C2 Oracle DEF C3 C++ KJP PK Table 4 Course_student Teacher name Course code Table 3 Course Roll no Name Syste m_us ed Hourl y_rat e Total _hrs C1 100 A1 P-1 20 7 C1 101 A2 P-II 30 3 C2 100 A1 P-I 20 6 C2 104 A7 P-III 37 3 C3 106 A7 P-IV 40 3 PK Second Normal Form • A relation is said to be in 2NF if and only if it is in 1NF and every non - key attribute is fully dependent on the primary key. • Table 4 Course_student is in 1NF but not in 2NF because non key attributes name, system_ used and hourly_ rate are not fully dependent on PK (Course code, Rollno). • To convert the above table in 2NF ,Rule is The A* B* C D A* B* + A* D C first relation contains the PK and the attributes that are fully dependent on the PK. Second relation contains the attributes that are partially dependent on the key and the key attribute on which it is dependent. 2NF(Contd…) • Transformation of Course_student into 2NF Course Code Roll no Total _hrs C1 100 7 C1 101 3 C2 100 6 C2 104 3 c3 106 3 Table 6 System Charge Contains attributes that are partially dependent Table 5 Hours_assigned Contains attribute Total_hrs which is fully dependent on PK (Course code, Rollno) Roll no Name System_used Hourly_rate 100 A1 P-I 20 101 A2 P-II 30 104 A7 P-III 37 106 A7 P-IV 40 Third Normal Form • A relation is said to be in 3NF if it satisfies the following conditions simultaneously – – R is already in 2NF – No non prime attribute is transitively dependent on the key i.e no non prime attribute functionally determines any other non –prime attribute. • Table 6 System charge is not in 3NF because there is transitive dependency of a non prime attribute on the PK of the relation. • In table 6 ,non-prime attribute Hourly_rate is transitively dependent on the pk rollno through FD. Rollno → System_used System_used → Hourly_rate => Rollno → Hourly_rate This is transitive dependency 3NF(Contd…) • Rule to convert to 3NF A* B C A* B* + B C Table 6 can be represented in 3NF by dividing it into two relations. First relation contains the pk and the attributes that are dependent on it. Second relation contains the non-prime attribute and another non- prime attribute on which it is FD and making it as pk. 3NF(Contd…) Rollno Name System _used 100 A1 P-I 101 A2 P-II 104 A7 P-III 106 A7 P-IV PK Table 8 Charges Table 7 System used System_used Hourly_rate P-I 20 P-II 30 P-III 37 P-IV 40 PK Boyce Codd Normal Form • BCNF states that – A relation R is said to be in BCNF if and only if every determinant is a candidate key. • Consider table 6, System Charge (Course code, Rollno, Name, System_used, Hourly_rate, Total_hrs) (Course code, Rollno) → Total_hrs Rollno → (Name, System_used, Hourly_rate) Rollno is a determinant but not a candidate key. System_used is again a determinant but not unique. So, Table 4 is not in BCNF. BCNF (Contd…) • To make it in BCNF ,divide it into three tables• First table Includes Course_time (Course code, rollno,Total_hrs) • Second table includes Student_system (Rollno, system_used, Hourly_rate) • Third includes Charge (System_used, Hourly_rate) Difference between 3NF and BCNF • A table in 3NF has to be in 2NF but in BCNF, a relation need not be in 2NF or 1NF. • BCNF is stronger definition of 3NF. Fourth Normal Form • A relation is said to be in 4NF if and only if– If R is already in 3NF or BCNF – If it contains no multi valued dependencies. • Two things must be noted– Firstly, in order for a table to contain MVD, it must have 3 or more attributes. – It is possible to have table containing two or more attributes which are independent multi valued facts about another attribute. 4NF(Contd…) • Consider a relation Course Student_name Textbook Physics Ankit Mechanics Physics Ankit Optics Physics Rahat Mechanics Physics Rahat Optics Chemistry Ankit Organic Chemistry Ankit Inorganic English Raj Literature English Raj Grammar Table 9 Course_student_book MVDs in the above table are- Course →→ Student_name Course →→ Textbook 4NF(contd…) • Problems with MVD – If a new student join the physics course, then we have to make two insertions for that student in the database which is equal to the no. of physics books. And if there are hundred textbooks for a subject? – Sly, if a new book is introduced for a course , then again we’ll have to make multiple insertions which is equal to the no. of students for that course. So, there is high degree of redundancy which will lead to update problems. Rule to transform a relation to 4NFA relation R having A,B,C as attributes can be no loss decomposed into two tables R1(A,B),R2(A,C) if and only if MVD A →→B/C hold in R. 4NF(Contd…) • To convert table 9 into 4NF,two separate tables are formed– Course_student → (Course, Student_name) – Course_book → (Course, Textbook) Course Student_ name Physics Ankit Physics Rahat Chemistry Ankit English Raj Table 10 Course_student Course Textbook Physics Mechanics Physics Optics Chemistry Organic Chemistry Inorganic English Literature English Grammar Table 11 Course_book Fifth Normal Form • A relation is said to be in 5NF if and only if– R is already in 4NF – It cannot be further non- loss decomposed. • Non loss decomposition is possible because of the availability of the join operator. • In 5NF,consideration must be given to tables where non-loss decomposition can be achieved by decomposition into 3 or more separate tables. • In join, a new table is formed which contains all the columns from both the joined tables whose tuples are defined by the restriction applied. 5NF(Contd…) • E.g. consider a relation Agent Compan y Product name Suneet ABC Nut Raj ABC Bolt Raj ABC Nut Suneet CDE Bolt Suneet ABC Bolt Table 12 It states that ABC makes both Nuts and Bolts but CDE makes only Bolts. This table can be decomposed into 3 tables – P1 (Agent, Company) P2 (Agent, Product name) P3 (Company, Product name) 5NF(Contd…) • P1 Agent P2 Company Suneet ABC Suneet CDE Raj ABC Agent P3 Company Product name ABC Nut ABC Bolt CDE Bolt Produc t name Suneet Nut Suneet Bolt Raj Bolt Raj Nut 5NF(Contd…) • Join P1 and P2 Agent Join P4 with P3 Agent Company Product_ name Suneet ABC Nut Suneet ABC Bolt Suneet CDE Bolt Raj ABC Bolt Raj ABC Nut Company Product_ name Suneet ABC Nut Suneet ABC Bolt Suneet CDE Nut* Suneet CDE Bolt Raj ABC Bolt Raj ABC Nut Table P4 Correct recomposition of the original table. So, the original table is in 5NF. Further Normal Forms • Join dependencies generalize multivalued dependencies – lead to project-join normal form (PJNF) (also called fifth normal form) • A class of even more general constraints, leads to a normal form called domain-key normal form. • Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of inference rules exists. • Hence rarely used Distributed Database Section-A Distributed Databases • • • • • • • • Definition of Distributed Databases DDBMS ,its characteristics Topology of DDBMS Advantages and Disadvantages of DDBMS Heterogeneous and Homogeneous Databases Distributed Data Storage Architecture of DDBMS DDBMS Design Distributed Databases “A logically interrelated collection of shared data physically distributed over a network.” DDBMS • DDBMS stands for Distributed Database Management System. Characteristics of DDBMS • A DDBMS has the following characteristics – A collection of logically shared data – The data is split into a number of fragments – Fragments may be replicated – Fragments/Replicas are allocated to sites – The sites are linked by a communication network – The data at each site is under the control of a DBMS – The DBMS at each site can handle local applications autonomously • NOTE:-It is not necessary for every site in the system to have its own local database. Topology of DDBMS Site 1 DB Site 4 Computer Network DB Site 2 DB Site 3 Advantages of DDBMS • • • • • • • Reflects organizational structure Improved share ability and local autonomy Improved availability Improved reliability Improved Performance Economics Modular growth Disadvantages of DDBMS • • • • • • Complexity Cost Security Integrity Control more difficult Lack of standards Lack of experience Homogeneous Distributed Databases • In a homogeneous distributed database – All sites have identical software – Are aware of each other and agree to cooperate in processing user requests. – Each site surrenders part of its autonomy in terms of right to change schemas or software – Appears to user as a single system Heterogeneous Distributed Databases • In a heterogeneous distributed database – Different sites may use different schemas and software • Difference in schema is a major problem for query processing • Difference in software is a major problem for transaction processing – Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing Distributed Data Storage • Replication – System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance. • Fragmentation – Relation is partitioned into several fragments stored in distinct sites • Replication and fragmentation can be combined – Relation is partitioned into several fragments: system maintains several identical replicas of each such fragment. Data Replication • A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. • Full replication of a relation is the case where the relation is stored at all sites. • Fully redundant databases are those in which every site contains a copy of the entire database. Data Replication (Cont.) • Advantages of Replication – Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. – Parallelism: queries on r may be processed by several nodes in parallel. – Reduced data transfer: relation r is available locally at each site containing a replica of r. • Disadvantages of Replication – Increased cost of updates: each replica of relation r must be updated. – Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented. • One solution: choose one copy as primary copy and apply concurrency control operations on primary copy Data Fragmentation • Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r. • Horizontal fragmentation: each tuple of r is assigned to one or more fragments • Vertical fragmentation: the schema for relation r is split into several smaller schemas – All schemas must contain a common candidate key (or superkey) to ensure lossless join property. – A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key. • Example : relation account with following schema • Account-schema = (branch-name, account-number, balance) Horizontal Fragmentation of account Relation branch-name Hillside Hillside Hillside acc-number A-305 A-226 A-155 balance 500 336 62 account1=branch-name=“Hillside” (account) branch-name Valleyview Valleyview Valleyview Valleyview account-number A-177 A-402 A-408 A-639 balance 205 10000 1123 750 account2=branch-name=“Valley view” (account) Vertical Fragmentation of employee-info Relation branch-name tuple-id customer-name Lowman Camp Camp Kahn Kahn Kahn Green Hillside Hillside Valleyview Valleyview Hillside Valleyview Valleyview 1 2 3 4 5 6 7 deposit1=branch-name, customer-name, tuple-id (employee-info) Acc-number A-305 A-226 A-177 A-402 A-155 A-408 A-639 balance 500 336 205 10000 62 1123 750 tuple-id 1 2 3 4 5 6 7 deposit2=account-number, balance, tuple-id (employee-info) Advantages of Fragmentation • Horizontal: – allows parallel processing on fragments of a relation – allows a relation to be split so that tuples are located where they are most frequently accessed • Vertical: – allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed – tuple-id attribute allows efficient joining of vertical fragments – allows parallel processing on a relation • Vertical and horizontal fragmentation can be mixed. – Fragments may be successively fragmented to an arbitrary depth. Data Transparency • Data transparency: Degree to which system user may remain unaware of the details of how and where the data items are stored in a distributed system • Consider transparency issues in relation to: – Fragmentation transparency – Replication transparency – Location transparency Architecture of DDBMS • The reference architecture of DDBMS includes – A set of global external schema – A global conceptual schema – A fragmentation schema and allocation schema – A set of schemas for each local DBMS • Global Conceptual Schema – Is a logical description of the whole database (as if it were not distributed) – Contains entities, relationships, constraints, security and integrity information. • Fragmentation and Allocation schemas – fragmentation is how data is logically partitioned. – Allocation schema is where data is to be located • Local Schemas – each local schema has its own set of schemas. Reference Architecture of DDBMS Global external Schema Global Conceptual Schema Fragmentation Schema Allocation Schema Local mapping schema Local Conceptual schema Local Internal Schema Local mapping schema Local mapping schema Local Conceptual Schema Local Conceptual Schema Local Internal Schema Database Local Internal Schema Distributed Database Design • The factors to be considered for the design are: – Fragmentation • Horizontal Fragmentation • Vertical Fragmentation – Allocation – Replication • Design should meet the following objectives – Locality of reference – Improved reliability and availability – Acceptable performance – Balanced storage capacities and costs – Minimal Communication costs Database Security and Integrity Database Security • Database security is important because data in the database is very valuable and sensitive. • So data should be protected from – Abuse – Unauthorized access – Update • Importance of security in database environment – – – – – – – – Data Security Risks Data Tampering Data Theft Falsifying User Identities Password related threats Unauthorized access to tables and Columns Lack of accountability Complex user management requirements. Security Levels • To protect the database, we must take security measures at several levels– – – – – • Physical Level Human Level Operating System Network Level Database System Data Security Requirements – Confidentiality – Integrity – Availability Physical Level Security • Protection of equipment from floods, power failure, etc. • Protection of disks from theft, erasure, physical damage, etc. • Protection of network and terminal cables from wiretaps non-invasive electronic eavesdropping, physical damage, etc. Solutions: • Replicated hardware: – mirrored disks, dual busses, etc. – multiple access paths between every pair of devises • Physical security: locks,police, etc. • Software techniques to detect physical security breaches. Human Level Security • Protection from stolen passwords, sabotage, etc. • Primarily a management problem: – Frequent change of passwords – Use of “non-guessable” passwords – Log all invalid access attempts – Data audits – Careful hiring practices Operating System Level Security • Protection from invalid logins • File-level access protection (often not very helpful for database security) • Protection from improper use of “superuser” authority. • Protection from improper use of privileged machine intructions. Network-Level Security • Each site must ensure that it communicate with trusted sites (not intruders). • Links must be protected from theft or modification of messages • Mechanisms: – Identification protocol (password-based), – Cryptography. Database-Level Security • Assume security at network, operating system, human, and physical levels. • Database specific issues: – each user may have authority to read only part of the data and to write only part of the data. – User authority may correspond to entire files or relations, but it may also correspond only to parts of files or relations. • Local autonomy suggests site-level authorization control in a distributed database. • Global control suggests centralized control. Confidentiality • Privacy of Communications – DBMS should be able to control the spread of personal information – It should keep corporate data such as trade secrets, proprietary info, competitive analysis secure and away from unauthorized access. • Secure storage of sensitive data – Integrity and privacy of confidential data must be protected. • Authentication – System must maintain security of database by verifying user’s identity. • Authorization – Authorization is second layer of security. – It is the process through which system obtains information about authenticated user including database operations user can perform and object that he can access. Integrity • It is concerned with the maintenance of the correctness and consistency of the data. • Integrity rules are – Domain Integrity rule • Is concerned with maintaining the correctness of attribute values within relation. – Entity Integrity rule • Primary key attribute cannot have null values. – Referential Integrity rule • foreign key attribute can only have primary key values or it can be null. Availability • A secure system makes data available to authorized users ,without delay. View –Another way to provide Security Views provide security by restricting access to a set of rows and columns of a table. A View is a logical table based on one or more tables, providing window to them. Views themselves do not contain data. They just act as a window through which contents of a table can be viewed. One can also manipulate the contents of the original table through these views. Syntax to create views in SQL • Create or replace view <View name> as <select operation>; Uses of View • Views restrict access to the database. • Critical data in the database is safeguarded as access to such data can be controlled using views. • Views allow users to make simple queries like joining of tables without even knowing how tom perform join. • Views allow the same data to be seen in different ways by different users. • Views can be used to select insert or update data .All the changes will be actually made in the base table. Protecting the data within the Database • Two methods b which the access control is done – – Privilege – Role • Privilege- A permission to access a named object. A privilege can be – Database Privilege – System Privilege – Object Privilege • Roles – A role is a mechanism that can be used to provide authorization. • Granting and Revoking Privileges – Grant Command – Revoke Command Role of DBA to provide security to a system • DBA plays an important role in enforcing security. • DBA deals with – – Creating new accounts- Each new user or group of users must be assigned an authorized id and a password. – Mandatory Control Issues- DBA must assign security class to each database object and security clearances to each authorization id in accordance with chosen security policy. Encryption • Data may be encrypted when database authorization provisions do not offer sufficient protection. • Properties of good encryption technique: – Relatively simple for authorized users to encrypt and decrypt data. – Encryption scheme depends not on the secrecy of the algorithm but on the secrecy of a parameter of the algorithm called the encryption key. – Extremely difficult for an intruder to determine the encryption key. Data Encryption • Encryption helps to protect information in certain situation where the normal mechanisms of DBMS are not adequate. Intruder Plain text Encryption Method Encryption Key Plain text Decryption Method Cipher text Decryption Key This process is called Cryptography Encryption (Cont.) • Data Encryption Standard (DES) substitutes characters and rearranges their order on the basis of an encryption key which is provided to authorized users via a secure mechanism. Scheme is no more secure than the key transmission mechanism since the key has to be shared. • Advanced Encryption Standard (AES) is a new standard replacing DES, and is based on the Rindael algorithm, but is also dependent on shared secret keys • Public-key encryption is based on each user having two keys: – public key – publicly published key used to encrypt data, but cannot be used to decrypt data – private key -- key known only to individual user, and used to decrypt data. Need not be transmitted to the site doing encryption. Encryption scheme is such that it is impossible or extremely hard to decrypt data given only the public key. Concurrency What is a transaction? • A set of changes that must all be made together. • A program unit in which either all the operations should be performed or none should be done • It is necessary for consistency of the database. Processes of Transaction • Read operation- When value of a data object from disk is copied to a program variable onto main memory. • Write operation- After modification when copy of data object is written to disk. Transaction Properties • ACID Properties • Atomicity • Consistency • Isolation • Durability • Scheduling of Transactions • • • • Complete schedule Serial Schedule Non –serial Schedule Serializiable Schedule Object Relational DBMS (ORDBMS) ORDBMS • An object-relational database (ORD) or object-relational database management system (ORDBMS) is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language. In addition, it supports extension of the data model with custom data-types and methods. Definition • An object relational database is also called an object relational database management system (ORDBMS). This system simply puts an object oriented front end on a relational database (RDBMS). • When applications interface to this type of database, it will normally interface as though the data is stored as objects. • However the system will convert the object information into data tables with rows and colums and handle the data the same as a relational database. • Likewise, when the data is retrieved, it must be reassembled from simple data into complex objects. • ORDBMS was created to handle new types of data such as audio, video, and image files that relational databases were not equipped to handle. In addition, its development was the result of increased usage of object-oriented programming languages, and a large mismatch between these and the DBMS software. • ORDBMS are systems that “attempt to extend relational database systems with the functionality necessary to support a broader class of applications and, in many ways, provide a bridge between the relational and object-oriented paradigms.” Why Extend Relational Data Model To eradicate the following weaknesses • Poor representation of ‘real world’ conceptual model – Usually the relational schema does not correspond to real world entities – Difficult to change schema without affecting the applications; • Semantic overloading – The same relation is used to represent entities as well as relationships • Poor support for integrity and business rules • Fixed number of attributes & all attribute values must be atomic • Limited operations • Difficult to handle recursive queries • Impedance mismatch (when SQL is embedded in PLs) • Short-lived transactions (strict locking and recovery mechanisms) ORDBMS Products • IBM’s DB2 Universal Database system. • Oracle Corporation and IBM have developed product as Universal Database Servers. • Informix’s Universal server that handles applications such as text, video, sound and large binary objects. Advantages • Easy to define complex objects and new data types • Reduce application development time • Support object views • Support powerful Query language. e.g. SQL 3 • Simplicity of development Disadvantages • Complexity • Immature • Increased Costs Object Oriented Databases OOD Concepts • • • • • • • • • Object Object identity Attribute Object state Methods and messages Class Abstraction and Encapsulation Inheritance Method overloading Object • An abstract representation of the real world entity that has unique identity, embedded properties and ability to interpret with other objects. • Each object consist of two components- a state and a behavior. • State- is a condition of an object or set of circumstances describing the object. • Behavior- the operations that can be defined for an object describe the behavior of the object with respect to other objects. Object identity • An object must have unique identity. It is represented by an object ID(OID). The OID is assigned by the system at the moment of object creation and cannot be changed under any circumstances. • OID is maintained by the system and not by the user. • The OID can be deleted only if the object is deleted and that OID can never be reused. Attribute • An attribute is a data value held by the object in a class. • Attributes are also known as instance variables . • Each attribute has a unique name and a data type associated with it. • Attributes also have a domain. Object State • The object state is a set of values that the objects attribute can have at a given time. • E.g. state of electric light would be ON/OFF. • State of an object can vary, but its OID remains same. • To change object’s state, change the values of object’s attribute. Methods and Messages • A method is a code that performs a specific operation on object’s data. • Methods protect data from direct and unauthorized access by other objects. • Methods are equivalent to procedures in programming languages. • Every method is identified by a name and has a body. • A message is a request to an object to invoke one of its methods. • A message therefore contains specification of the receiver object, name of the method and arguments of the methods. Class • A class is a collection of similar objects with shared attributes, methods and common relationships to other objects. • Each object of a class is called an instance of a class. • Each instance has its own values for each attribute but shares the same attribute name and methods with other instances of a class. Abstraction and Encapsulation • Abstraction is focusing on the essential aspects of an entity and ignoring its unimportant details. • Encapsulation consists of separating the external aspects of an object ,which are accessible to other objects ,from the internal implementation details of the objects which are hidden from other objects. • Encapsulation forms a sort of data independence. Inheritance • The ability of an object within a hierarchy to inherit the attributes and methods of classes above it. • Inheritance is sharing of attributes and methods among classes based on hierarchical relationship. • If there are two classes A and B .The objects of class B have access to attributes and methods of class A without the need to redefine them, then class A is known as Super class and class B is known as Sub class. Method overloading and Polymorphism • Method overloading offers the possibilities to use same name for different methods. • Polymorphism allows different objects to response to the same message in different ways. • It is important feature of OO systems as its existence allows objects to behave according to their specific characteristics. • To sum up, polymorphism means that same operation may behave differently on different classes. OODBMS In OODBMS, • object oriented programming concepts such as inheritance, encapsulation etc. are enforced in addition to database management concepts such as ACID properties • which lead to – System integrity – Support for adhoc query language – Secondary storage management systems. OODBMS Object – Oriented Features OO Concepts OO data Model OOPL OODBMS Persistence Conventional DBMS Features Backup and Recovery Transaction Concurrency Administration Security and Integrity Data Accessibility OODBMS Products • • • • ‘Object Store’ by Design Inc. Objectivity/DB’ by Objectivity Inc. Versant OODBMS’ by versant software ‘Inter Systems’ cache OODBMS by Ajou University Medical Center in South Korea. • Oracle Berkeley DB by Oracle Corporation. Applications of OODBMS • Financial applications in risk management • Telecommunication applications such as network configuration management applications. • Medical applications that handle digitized data such as X-rays, Ultrasounds together with textual data used for medical research and patient medical history analysis. • CAD and CAM • Computer Assisted Software Engineering (CASE), which are designed to handle very large amount of inter related data. Advantages • Enhanced Modeling Capabilities – With the introduction of objects, the data and its behavior can be grouped as a unit. – Also, the objects can interrelate with other objects and form complex objects which cannot be handled by conventional data models. • Applicability to advanced database Applications – Conventional databases simply lack the ability to provide efficient applications in CAD, CAM, Medical Imaging etc. • Extensibility – The database system comes with a set of predefined types, an OODBMS allows new data types to be defined from the existing data types. • Improved performance – OODBMS provides performance improvements compared to RDBMS when managing complex objects. • Faster application development time – It is obtained reusability. through inheritance and Disadvantages • Lack of theoretical foundation – The OODBMS is based on object model which lacks the solid theoretical foundation of the relational model on which the RDBMS is built. • Competition – OODBMS’s face strong and effective opposition from the firmly established RDBMS. • Lack of standard Tools • Lack of experience • Lack of standard adhoc query language • • • • Lack of support for views Lack of foreign key concepts Lack of support for security Complexity Object Query Language (OQL) • Declarative query language – Not computationally complete • Syntax based on SQL (select, from, where) • Additional flexibility (queries with user defined operators and types) Example of OQL query The following is a sample query “what are the names of the black product?” Select distinct p.name From products p Where p.color = “black” Valid in both SQL and OQL, but results are different. Result of the query (SQL) Original table Product no Name Color P1 P2 P3 Ford Mustang Black Toyota Celica Green Mercedes SLK Black RESULT Name Ford Mustang Mercedes SLK The statement queries a relational database. => Returns a table with rows. Result of the query (OQL) Original table Product no Name Color P1 P2 P3 Black Green Black Ford Mustang Toyota Celica Mercedes SLK RESULT String Ford Mustang String Mercedes SLK The statement queries a object-oriented database => Returns a collection of objects. Comparison Queries look very similar in SQL and OQL, sometimes they are the same In fact, the results they give are very different Query returns: OQL SQL Object Collection of objects Tuple Table Recovery of Data Causes of Failure • • • • • • • • • System Crash User Error Carelessness Sabotage (Intentional Corruption) Statement Failure (Inability to execute SQL Statement) Application S/W failure (Logical Error) Network Failure Media Failure (Hard Disk crash) Natural Physical disasters (fire, floods, earthquake) Recovery from failures • First we need to identify failure modes. • Next must consider how these failures affect the contents of the database. • Then propose the recovery algorithm to recover the database. • Recovery algos perform two types of actions– Actions taken during normal transaction to ensure enough information exists to allow recovery from failure. – Actions taken following a failure to recover the database to a state that is known to be correct. Terminology used in recovery process • Disk is partitioned into fixed length storage units called Blocks. • Blocks are the units of data transfer to and from disk. • When a transaction starts the data required for the transaction is transferred from the disk to main memory in blocks. • When all the operations are performed on the block present in main memory, then the block is transferred from main memory to disk for final storage of results. • The blocks residing in main memory are referred to as buffer block. • The blocks residing on the disk are called Physical block. Contd… Buffer Block A A Main Memory Physical Block Disk Block movements between disk and main memory are categorized into two operations•Input operation- Transfer of physical block to main memory •Output operation- Transfer of buffer block in main memory to disk Recovery and Atomicity of a transaction • Consider a banking transaction that transfer Rs.50 from account A to account B with initial values of A and B being Rs.1000 and Rs.2000 respectively. • T1 Read (A, a) A=a-50 Write (A, a) Output (A) Read (B, b) B=b+50 Write (B, b) Output (B) • Suppose that a system crash occurs during the execution of transactions T1 after output (A) has taken place but before output (B) has executed. Since output (A) operation was performed successfully ,it means that data item A has value of 950 but since output (B) is not performed so its value remain 2000. System enters into inconsistent state because there is a loss of Rs.50 after execution of transaction T1. To recover from above problem two methods are• Re-execute T1- If transaction T1 is re-executed successfully then the value of A is Rs.900 and B is Rs.2050.It again results in incorrect information as A’s correct value as 950 and not 900. • Do not Re-execute T1- The current state of system is A=950 and B=2000 which again is an inconsistent state. • So, when simple recovery schemes do not work, we use – Log based Recovery – Shadow Paging Log Based Recovery • A log file is maintained for recovery purpose. • Log File is a sequence of log records • Log record maintain a record of all operations of the database. • There are several types of Log records– <Start>log record – <Update>log record – <Commit> log record – <Abort> log record Types of Log records • <Start> log record – It contains about the start of each transaction. – It includes transaction identifier. Transaction identifier is the unique identification of the transaction. – <Ti, start> • <Update > log record – It describes a single database and has the following fields. – <Ti,Xj,V1,V2> Ti-Transaction identifier, Xj- Data item,V1-Old value,V2- New value • <Commit> log record – When a transaction successfully commits or completes, <Ti, commit > log record is stored. • <Abort> log record – When a transaction Ti is aborted due to some reason, a <Ti, abort> log record is stored in log file. Techniques for Log based Recovery • Two techniques– Deferred Database Modification – Immediate Database Modification • Deferred Database Modification – It ensures transaction atomicity by recording all database modifications in log but deferring the execution of all write operations until the transaction partially commits. – A transaction is said to be partially committed once the final action of the transaction has been executed. – It means that during write operation the modified values of local variable are not copied into database items but the corresponding new and old values are stored in the log file. • When the transaction successfully performs all its operations, then the information stored in log record is used to set the values of data items. • But if the transaction fails to complete, then the data item retain their old values and information in log record is simply ignored. Recovery Scheme • Redo (Ti)- It sets the value of all data items updated by the transaction Ti to the new value from the log of records. • After a failure has occurred , the recovery subsystem consults the log to determine which transaction needs to be redone. • Transaction Ti needs to be redone if and only if the log contain both records <Ti, Start> and <Ti, Commit>. • In other cases, ignore the log record and re-execute the transaction. Immediate Database Modification • It allows database modification to be output in the database while the transaction is still in the active state. • If a system crashes or transaction aborts then the old value field of the log record is used to restore the modified data items. • This restoration is accomplished by undo operation. Recovery Scheme • Undo (Ti)- It restores the values of all data items updated by transaction Ti to old values. • After a failure had occurred, the recovery scheme determine which transaction needs to be undone. • A transaction needs to be undone if the log record contains record <Ti, Start> and not <Ti, Commit>. • Redo (Ti)- As discussed before Shadow Paging • Two types of recovery concepts– In-place updating- In this, system writes the data item in the same location, thus overwriting the old value of the data item on disk. – Thus there is a single copy of each data item maintained on disk. This technique is used in Log based recovery. – Shadowing- In this, system writes new data item at different disk location. Thus there are multiple copies of a data item maintained on disk. This is known as Shadow Paging. • Database is partitioned into some no. of fixed length blocks, which are referred to as Pages. • Suppose there are n pages numbered 1 through n which do not need to be stored in particular order on disk. • A page table is maintained for each page n disk. This helps in finding ith page of the database as each entry contains a pointer to a page on disk. • The first entry contains a pointer to first page and so on. • The key idea behind shadow paging is to maintain two page tables during the life of a transaction- the current page table and the shadow page table. • When the transaction starts both tables are identical. • The shadow page table is never changed over the duration of the transaction . • The current page table may be changed when a transaction performs a write operation. Write operation • • Suppose a transaction performs a write (x) operation and that x resides on the ith page. Write operation is executed as follows1. If the ith page is not main memory ,then first bring that page from disk to main memory (Input operation). 2. If it is the first write operation on the ith page by a transaction, then1. First find an unused page on disk. 2. Delete the page found from list of free pages. 3. Modify the current page table such that ith page points to new page. 4. Assign new modified value to that page. Fig. showing Shadow paging 1 2 3 4 5 6 7 Shadow Page table 1 2 3 4 5 6 7 8 Pages on disk 1 2 3 4 5 6 7 Current Page table Example:A • T1 Read (A, a) B a= a-100 C Write (A, a) Read (B, b) b=b+ 100 Write (B, b) • T2 Shadow Page Read (C, c) Table C=C-200 Write (C, c) [A=1000,B=2000,C=3000] *** 1000 3000 *** A B C 2000 Pages on disk Current Page Table Structured Query Language Oracle • The Oracle corporation is world’s leading supplier for information management and world’s second largest dependent S/W company. • Oracle is the product of Oracle corporation and is the top selling multi-user DBMS. • First released in 1970s by Larry Elision & his team and organization software development laboratories (SDL). • In 1984,oracle was ready to run over PCs released by companies such as IBM and Apple. • In 1985 and 1986,oracle released its versions 5.0 and 5.1 of its RDBMS. • In 1992,after intense research & development and testing oracle released version 7. • In 1997,released oracle 8. • In 1999 oracle 8i was released with added functionality. • Various new versions like Oracle 9i,10g,11i are being widely used. Products of Oracle 8i • Four main products of 8i are– Oracle 8i- Oracle serves for small no. of users and small database size. – Oracle 8i Enterprise Edition- For large no. of users or large database size. – Oracle 8i Personal Edition- Single version of Oracle. – Oracle 8i Lite – Light weight database engine for mobile computing on notebooks. Features of Oracle 8 • • • • • • • • Client/Server Architecture Large database & space management Concurrent Processing High transaction processing performance Controlled availability Manageable Security Portability Compatibility SQL • To communicate with Oracle engine, User needs some language. • The most acceptable language is ANSI SQL (Structured query language) • Oracle includes interactive SQL tool called SQL * plus which allows the users to enter ANSI SQL sentences and pass them to oracle engine for execution. • These sentences allow the user to create ,access and maintain data structure like tables, indexes etc. Steps to invoke SQL * Plus/SQL • Start your Operating System normally. • Click on the Start button and then on programs. It displays the list of programs. • Log Click on Oracle 95 and the select SQL* On Plus. UserName : is prompted for oracle login • The user Scott Password : ID and password so that user may get Tiger Host String : connected to oracle. OK Cancel Contd… • Once Login id and password are entered ,an SQL prompt appears. • At this point requesting user is connected with the oracle engine. • User can interact by giving its queries to the oracle engine for the execution. SQL Fundamentals • SQL is a simple and powerful language used to create ,access and manipulate data and structure in the database. • SQL is like plain English, easy to understand and write. • SQL is not case sensitive. SQL Statements • Data Definition Language (DDL) – Create, alter, drop, rename • Data Manipulation Language (DML) – Delete, Insert, Select, Update • Data Control Statements (DCL)– Grant ,Revoke • Transaction Control Statements (TCL)- Commit, Rollback, Savepoint Data Types • The most commonly used data types are:– – – – – – Char Varchar or Varchar2 Number Date Long Long raw/raw • Char(n) – Used to store character strings of fixed size. Size of the character string is determined by numeric value of n. This data type can hold maximum of 255 characters. • Varchar(n)/Varchar2(n) – Used to store variable length alphanumeric data. It can store maximum of 2000 characters. Contd… • Number (p, s) – Used to store numbers fixed or floating point. The precision p determines the length of the data while Scale (S) determines the number of places after the decimal. • Date – This data type stores date values in a special format internal to oracle. The default format in which date is stored is DD-MON-YY . • Long – Used to store up to 2 gigabytes of alphanumeric text data. • Long Raw / Raw – used to store graphics & sound files when used in conjunction with long to form long raw. Creating & Managing Tables • A table is composed of rows & columns. • A table can represent a single entity. • A table must have unique name through which it can be referred. • A table has no. of characteristics. The characteristics of a table are called its attributes. Table creation rules • Table names and column names must begin with a letter and can be 1-30 characters long. • Names must contain only the characters A-Z,a-z,09,_(underscore),$, #. • Name must not be an oracle server reserved word or SQL reserved word. • Names must not duplicate the names of other objects. • Table name is not case sensitive. Constraints • Constraints are the restrictions that are enforced so that the data stored in the database is valid and prohibits wrong data to be entered into the database. • SQL will reject any value that violates the criteria that was defined. • Categories of constraints– Column level Constraint- applied on individual columns. – Table constraints – applied to more than one column of a table. • Types of Constraints– – – – – Not Null Constraint Unique Constraint Primary Key Constraint Check Constraint Default constraint Types of Constraints • Not Null Constraint – It prevents the column from accepting Null values. – This can be applied only at Column level. – Syntax- Columnname data type (size) Not Null • Unique Constraint – This allows unique values to be entered into the column i.e. every value is a distinct value in that column. – When we apply unique constraint on a group of columns ,then each value of the group should be unique. – A table can have more than one unique key. – Syntax- Column level Constraint • Columnname data type (size) Unique; – Table level Constraint • Columnname1 data type (size),columnname2 data type (Size), Unique (Columnname1, Columnname 2)); Contd… • Primary Key Constraint – Is used to identify each row of a table uniquely. – A pk may be a single column or group of columns. – It is a combination of both the unique key and not null constraint. – When pk is applied on a single column it is called as simple key. – When applied on multiple columns it is called as composite primary key. – Syntax- column level Columnname data type (size) primary key – Table level Columnname1 data type (size),columnname2 data type (Size), primary key (Columnname1, Columnname 2)); Contd… • Check constraint – It allows Oracle to verify the validity of data being entered on a table against a set of constant values. – It consists of keyword CHECK followed by parenthesized conditions. – Check Constraint must be specified as a logical expression that evaluates either to be true or false. – If an SQL statement causes the condition to be false an error message will be displayed. – Syntax:Columnname data type (size) check (logical expression) • Default Constraint – It allows the user to insert values in the columns where the user do not want to insert the value. – Syntax Columnname data type (size) default value; Contd… • Foreign Key Constraint – – – – – – – – – It helps to establish a relationship among tables. A fk may be single column or the combination of columns which derive their values based on primary key or unique key values from another table. A fk is also known as referential integrity constraint. A table in which there is a foreign key is called detail table or the child table. And the table to which it refers the values of the pk is called master table or parent table. A fk must have corresponding primary key in master table Whenever you insert a value in the foreign key column, it checks the value in the primary key of the parent table. If value is not present ,it gives an error. SyntaxColumnname data type (size) references table name (column name) [on delete cascade]