Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational algebra wikipedia , lookup
UNIT II Introduction to the Relational Model Integrity Constraint Over relations Enforcing Integrity constraints Querying relational data Logical data base Design Introduction to Views Destroying /altering Tables and Views. Relational Algebra Selection and projection set operations renaming – Joins – Division Examples of Algebra overviews Relational calculus Tuple relational Calculus Domain relational calculus Expressive Power of Algebra and calculus. Relational Model •Relations: The main construct for representing the data in the relational model is a relation. •A relation consists of a relation schema and relation instance. Relation Schema : specifies name of relation, plus name of each field( or column. or attribute, or property) and the domain of each field. A domain is referred to in a relation schema by the domain name, and has a set of associated values. E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real) #fields = degree / arity An Instance of a relation is a set tuples, also called records or rows, in which each tuple has the same number of fields as the relation schema. Cardinality=#rows. 2 sid name 53666 Jones 53688 53650 login age gpa jones @c s 18 3.4 Smith smith@e ecs 18 3.2 Smith smith @math 19 3.8 • Cardinality = 3, arity = 5 , all rows distinct A relational database is a collection of relations with distinct relation names. The relational database schema is the collection of schemas for the relations in the database An instance of a relational database is a collection of relation instances, one per relation schema in the database schema Creating and modifying relations using SQL 1. 2. 3. 4. Create Insert Update Delete 3 Integrity Constraints • An Integrity Constraint (IC) is a condition that is specified on a database schema, and restricts the data that can be stored in an instance of the database. • If a database instance satisfies all the integrity constraints specified on the database, it is a legal instance. • A DBMS enforces integrity constraints, in that it permits only legal instances to be stored in a database. • Integrity constraints are specified enforced at different times. 1) When a DBA or end user defines a database, he or she species the ICs that must be hold on any instance of this database. 2) When a database application is run, the DBMS checks for violations. • 1)Domain Constraints: The values that appears in a column must be drawn from the domain associated with that column. 4 Domain of field is essentially the type of that field. 2)Key Constraints: A key constraint is a statement that a certain minimal subset of the fields of a relation is a unique identifier for a tuple. A set of fields that uniquely identifies a tuple according to a key constraint is called a candidate key for the relation. Key do not permit duplicate values. No subset of key is a unique identifier for a tuple. Ex: The set of fields {sid,name} is not a key for stuednts. But the set {sid,name} is a super key for students relation. A relation may have several keys. Out of the available candidate keys a database designer can identify a primary key. Name StudentNo Sex Degree 5 3)Foreign Key Constraints: • Key and Entity Integrity constraints are specified on individual relations • Referential Integrity constraints are specified between two relations EMP [ Eno, Name, Salary, SuperEno, DeptNo ] DEPT [ Dnumber, Dname, Dlocation ] • A set of attributes FK in relation schema R1 is – the attributes of FK have the same . domain as the the primary key attributes PK of another schema R2 – t1[FK] = t2[PK] or t1[FK] is null • FK is said to “reference” PK a foreign key if Example Foreign Key • Every employee is assigned to one department – DeptNo is a foreign key of EMP referencing DEPT • Every Department has a Manager. The manager is also an employee – Manager is a foreign key of DEPT referencing EMP • Every Employee has a supervisor who is also an employee – SuperEno is a foreign key of EMP referencing EMP DEPT[Dnumber, Dname, Dlocation, Manager] EMP[Eno, Name, Salary, SuperEno, DeptNo] 4)General Constraints • General class of constraints that cannot be enforced by the preceding constraints • Semantic Constraints – “The salary of an employee should not exceed the employee’s supervisor’s salary” – “The maximum number of hours that an employee can work on a project is 56” • Transition Constraints – “The salary of an employee can only increase” • Often implemented in a constraint specification language (SQL3) using triggers and assertions Enforcing Integrity Constraints • Each DBMS should have means to resolve invlaid entries such as: – What happens if an entry that duplicates a key entry is entered? – What should be done if an entry of a foreign key is deleted? • A foreign key is a key where at least one field depends on a field from a different table. – What happens when an invalid entry is entered? • Associated with each attribute is a set of values, called a domain, that can be assigned to the entry of a tuple corresponding to the attribute. A relation schema is a set of attributes. Example EMP = { Name, SSN, DeptName, Salary, Birthdate } • Convention EMP(Name,SSN,DeptName,Salary,Birthdate) • • Querying Relational Data • Query is a question about the data • The answer consists of a new relation containing the result • SQL – the most popular query language SELECT * FROM Students S WHERE S.GPA > 3.2 SELECT S.Login, S.GPA FROM Students S WHERE S.GPA > 3.2 10 Logical Database Design (ER TO RELATIONAL) • Entity sets to tables: ssn name Employees lot CREATE TABLE Employees (ssn CHAR(11), name CHAR(20), lot INTEGER, PRIMARY KEY (ssn)) – Each entity is defined as a table. Use the entity name as the table name. Make the entity identifier as the primary key for the table. Examine domain constraints for each attribute. – Include the primary key of one of the tables in the second one. This becomes the foreign key and enables linkage between the two tables. Relationship Sets to Tables • In translating a relationship set to a relation, attributes of the relation must include: – Keys for each participating entity set (as foreign keys). • This set of attributes forms a superkey for the relation. – All descriptive attributes. CREATE TABLE Works_In( ssn CHAR(1), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments) • Translation of relationship that relates entities from the same entity set name ssn lot Employees CREATE TABLE Reports_To ( supervisor_ssn CHAR(11) supervisor subordinate_ssn CHAR(11) subordinate Reports_To PRIMARY KEY (supervisor_ssn, subordinate_ssn), FOREIGN KEY (supervisor_ssn, REFERENCES Employees(ssn), FOREIGN KEY (subordinate_ssn, REFERENCES Employee(ssn)) • We need to explicityly name the referenced field of Employees because the field name differs from the name(s) of the referring field(s) Translating ER Diagrams with Key Constraints • Map relationship to a table: – Note that did is the key now! – Separate tables for Employees and Departments. • Since each department has a unique manager, we could instead combine Manages and Departments. CREATE TABLE Manages( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments) CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees) Translating Weak Entity Sets • Weak entity set and identifying relationship set are translated into a single table. – When the owner entity is deleted, all owned weak entities must also be deleted. CREATE TABLE Dep_Policy ( pname CHAR(20), age INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE CASCADE) Views • A view is a pseudo or virtual table that is used to retrieve data that exists in the underlying database tables • The view query must be executed each time the view is used • A view can be used to simplify queries or to restrict access to sensitive data • A view is created with the CREATE VIEW command • A view cannot be modified; to change a view, it must be dropped and then re-created, or the CREATE OR REPLACE VIEW command must be used • Any DML operation can be performed on a simple query if it does not violate a constraint • A view that contains expressions or functions, or that joins multiple tables, is considered a complex view • A complex view can be used to update only one table; the table must be a keypreserved table • Data cannot be added to a view column that contains an expression • DML operations are not permitted on non key-preserved tables Types of Views 17 Creating a View • You use the CREATE VIEW keywords to create a view • Use OR REPLACE if the view already exists • Use FORCE if the underlying table does not exist at the time of creation • Provide new column names if necessary Oracle 10g: SQL 18 • WITH CHECK OPTION constraint – if used, prevents data changes that will make the data subsequently inaccessible to the view • WITH READ ONLY – prevents DML operations Creating a Simple View • Only references one table – no group functions, GROUP BY clause, or expressions Oracle 10g: SQL 20 DML Operations on a Simple View • Any DML operations are allowed through simple views unless created with WITH READ ONLY option • DML operations that violate constraints on the underlying table are not allowed Oracle 10g: SQL 21 Creating a Complex View • A complex view may contain data from multiple tables or data created with the GROUP BY clause, functions, or expressions • Type of DML operations allowed depends on various factors Oracle 10g: SQL 22 DML Operations on a Complex View with an Arithmetic Expression • Values cannot be inserted into columns that are based on arithmetic expressions Oracle 10g: SQL 23 DML Operations on a Complex View Containing Data from Multiple Tables • DML operations can not be performed on non key-preserved tables, but they are permitted on key-preserved tables Oracle 10g: SQL 24 DML Operations on a Complex View Containing Data from Multiple Tables (continued) Oracle 10g: SQL 25 DML Operations on a Complex View Containing Functions or Grouped Data • DML operations are not permitted if the view includes a group function or a GROUP BY clause Oracle 10g: SQL 26 DML Operations on a Complex View Containing DISTINCT or ROWNUM • DML operations on a view that contains the DISTINCT keyword or ROWNUM are not permitted Oracle 10g: SQL 27 Dropping a View • Use DROP VIEW command Oracle 10g: SQL 28 Creating an Inline View • An inline view is a temporary table created by using a subquery in the FROM clause • It can only be referenced while the command is being executed • Most common usage – “TOP-N” analysis Oracle 10g: SQL 29 Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative.) Understanding Algebra & Calculus is key to understanding SQL query processing! Relational Algebra & Relational Calculus • Relational algebra and relational calculus are two query languages associated with the relational model. • Queries in Relational Algebra are composed using a collection of operators, and each query describes the step-by-step procedure for computing the desired answer. • In relational calculus the query describes the desired answer without specifying how the answer is to be computed. So this 30 non-procedural style of querying is called declarative. Preliminaries • A query is applied to relation instances, and the result of a query is also a relation instance. – Schemas of input relations for a query are fixed (but query will run over any legal instance) – The schema for the result of a given query is also fixed. It is determined by the definitions of the query language constructs. • Positional vs. named-field notation: – Positional notation easier for formal definitions, named-field notation more readable. – Both used in SQL • Though positional notation is not encouraged Relational algebra • RA is one of the two formal query languages associated with relational model. • Queries in RA are composed of operators. • A fundamental property is that every operator in algebra accepts (one or two) relation instances and return a relation instance as result. • A relational algebra expression is recursively defined to be a relation. • Each relational query describes a step-by-step procedure for computing the desired answer, based on the order in which the operators are applied in 31 the query. Relational Algebra: 5 Basic Operations • • • • • Selection ( s ) Selects a subset of rows from relation (horizontal). Projection (p) Retains only wanted columns from relation(vertical). Cross-product ( ) Allows us to combine two relations. Set-difference ( — ) Tuples in r1, but not in r2. Union ( ) Tuples in r1 or in r2. Projection () - Retains only attributes that are in the “projection list”. - Schema of result: = exactly the fields in the projection list, with the same names that they had in the input relation. - Projection operator has to eliminate duplicates age(S2) sname,rating(S2) Selection () - Selects rows that satisfy selection condition. - Result is a relation. - Schema of result is same as that of the input relation. rating 8(S2) 32 Union and Set-Difference • Both of these operations take two input relations, which must be union-compatible: – Same number of fields. – `Corresponding’ fields have the same type. sid 22 31 58 sname dustin lubber rusty S1 rating 7 8 10 sid 28 31 44 58 age 45.0 55.5 35.0 sname yuppy lubber guppy rusty sid sname rating age 22 31 58 44 28 dustin lubber rusty guppy yuppy 7 8 10 5 9 45.0 55.5 35.0 35.0 35.0 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 S2 S1 S2 33 S1 sid 22 31 58 Set Difference sname rating age dustin 7 45.0 lubber 8 55.5 rusty 10 35.0 sid sname rating age 22 dustin 7 45.0 S1 S2 Cross-Product S2 sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0 sid sname rating age 28 yuppy 9 35.0 44 guppy 5 35.0 S2 – S1 • S1 R1: Each row of S1 paired with each row of R1. • Q: How many rows in the result? • Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. – May have a naming conflict: Both S1 and R1 have a field with the same name. 34 – In this case, can use the renaming operator: Cross Product Example R1 S1 sid bid day 22 58 101 103 10/10/96 11/12/96 (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 R1 X S1 sid 22 31 58 sname rating dustin 7 lubber 8 rusty 10 age 45.0 55.5 35.0 35 Compound Operator: Intersection • In addition to the 5 basic operators, there are several additional “Compound Operators” – These add no computational power to the language, but are useful shorthands. – Can be expressed solely with the basic ops. • Intersection takes two input relations, which must be union-compatible. • Q: How to express it using basic operators? R S = R (R S) S1 sid sname rating age 22 31 58 dustin lubber rusty 7 8 10 sid 28 31 44 58 S1S2 sid 31 58 sname lubber rusty 45.0 55.5 35.0 rating 8 10 sname yuppy lubber guppy rusty age 55.5 35.0 S2 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 36 Compound Operator: Join • Joins are compound operators involving cross product, selection, and (sometimes) projection. • Most common type of join is a “natural join” (often just called “join”). R S conceptually is: – Compute R S – Select rows where attributes that appear in both relations have equal values – Project all unique attributes and one copy of each of the sid common bid ones. day 22 58 101 103 10/10/96 11/12/96 S1 sid 22 31 58 R1 sname dustin lubber rusty rating 7 8 10 age 45.0 55.5 35.0 S1 R1 sid sname rating age bid day 22 58 dustin rusty 7 10 45.0 35.0 101 103 10/10/96 11/12/9637 Other Types of Joins • Condition Join (or “theta-join”): (sid) 22 31 R c S c ( R S) sname rating age dustin 7 45.0 lubber 8 55.5 (sid) 58 58 bid 103 103 day 11/12/96 11/12/96 • Result schema same as that of cross-product. • May have fewer tuples than cross-product. Find names of sailors who’ve reserved boat #103 sname (( sname ( bid 103 Re serves) Sailors) bid 103 (Re serves Sailors)) Find names of sailors who’ve reserved a red boat sname (( Boats) Re serves Sailors 38) color ' red ' Compound Operator: Division • Useful for expressing “for all” queries like: Find sids of sailors who have reserved all boats. • For A/B, attributes of B must be subset of attrs of A. – May need to “project” to make this happen. • E.g., let A have 2 fields, x and y; B have only field y: A B x y B( x, y A) Expressing A/B Using Basic Operators A/B contains all tuples (x) such that for every y tuple in B, there is an xy tuple in A. For A(x,y)/B(y), compute all x values that are not `disqualified’ by some y value in B. x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. Disqualified x values: x (( x ( A) B) A) A/B: x ( A) Disqualified x values 39 Examples of Division A/B A B x sno s1 s1 s1 s1 s2 s2 s3 s4 s4 A pno p1 p2 p3 p4 p1 p2 p2 p2 p4 y B( x, y pno p2 pno p2 p4 A) B2 pno p1 p2 p4 sno s1 s2 s3 s4 sno s1 s4 A/B1 A/B2 sno s1 B1 B3 A/B3 40 Find sailors who’ve reserved a red or a green boat • Can identify all red or green boats, then find sailors who’ve reserved one of these boats: (Tempboats, ( Boats)) color ' red ' color ' green ' sname(Tempboats Re serves Sailors) Find sailors who’ve reserved a red and a green boat (Tempred, sid (Tempgreen, (( sid color ' red ' (( Boats) Re serves)) color ' green' Boats) Re serves)) sname((Tempred Tempgreen) Sailors) Find the names of sailors who’ve reserved all boats (Tempsids, ( sid, bid Re serves) / ( bid Boats)) sname (Tempsids Sailors) find the names of sailors who’ve reserved all ‘Interlake’ boats: 41 Relational Calculus • Relational Calculus is an alternative to relational algebra. • In contrast to Relational Algebra, which is procedural, the calculus is non-procedural (declarative). • It allows us to describe the set of answers without specifying how they should be computed. • It has had more influence on SQL. • Calculus has – variables, constants, comparison ops,logical connectives – quantifiers • Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). – TRC: Variables range over (i.e., get bound to) tuples. • Like SQL. – DRC: Variables range over domain elements (= field values). • Like Query-By-Example (QBE) 42 • Expressions in the calculus are called formulas. Tuple Relational Calculus • Query has the form: {T | p(T)} – p(T) denotes a formula in which tuple variable T appears. • Result is the set of all tuples t for which the formula p(T) evaluates to true with T=t • Formula P(T) specified in First Order Logic • A tuple variable is a variable that takes on the tuples of a particular relation schema as values. • An Atomic formula is one of the following: R Rel R.a op S.b R.a op constant op is one of <,>,=,<=,>=,!= 43 TRC Formulas • • An Atomic formula is one of the following: R Rel R.a op S.b R.a op constant , , , , , op is one of A formula can be: – an atomic formula – p, pq, pq, p q where p and q are formulas – R( p(R)) where variable R is a tuple variable – R( p(R)) where variable R is a tuple variable Free and Bound Variables • The use of quantifiers X and X the formula. – A variable that is not bound is free. • in a formula is said to bind X in There is an important restriction — the variable T that appears to the left of `|’ must be the only free variable in the formula p(T). — in other words, all other tuple variables must be bound using a 44 quantifier. Examples • Find all sailors with rating above 7 {S |S Sailors S.rating > 7} • Find names and ages of sailors with rating above 7. {S | S1 Sailors(S1.rating > 7 S.sname = S1.sname S.age = S1.age)} – Note: S is a tuple variable of 2 fields (i.e. {S} is a projection of Sailors) • only 2 fields are ever mentioned and S is never used to range over any relations in the query. Joins Find sailors rated > 7 who’ve reserved boat #103 {S | SSailors S.rating > 7 R(RReserves R.sid = S.sid R.bid = 103)} • Note the use of to find a tuple in Reserves that `joins with’ the 45 Sailors tuple under consideration. Find sailors rated > 7 who’ve reserved a red boat {S | SSailors S.rating > 7 R(RReserves R.sid = S.sid B(BBoats B.bid = R.bid B.color = ‘red’))} Division Find sailors who’ve reserved all boats (hint, use ) {S | SSailors BBoats (RReserves(S.sid=R.sidB.bid=R.bid))} Find sailors who’ve reserved all Red boats {S | SSailors B Boats ( B.color = ‘red’ R(RReserves S.sid = R.sid B.bid = R.bid))} 46 Domain Relational Calculus • A domain variable is a variable that ranges over the values in a domain of some attribute. • Query has the form: x1, x 2,..., xn | p x1, x 2,..., xn Answer includes all tuples that make the formula be true. DRC Formulas • Atomic formula: – <x1,x2,…,xn> Rel , where Rel is a relation with n variables – X op Y – X op constant – op is one of comparison ops (<,>,=,,.), • A formula is recursively defined to be one of the following: • p, pq, pq, pq , where p, q are formulas • X(p(X)), where X is a domain variable 47 • X(p(X)), where X is a domain varialbe Find all sailors with a rating at least 7 I , N ,T , A | I , N ,T , A SailorsT 7 Find sailors rating > 7 who’ve reserved boat #103 I, N, T , A | Ir, Br, D I, N, T , A Sailors T 7 Ir, Br, D Re serves Ir I Br 103 Find sailors rated > 7 who’ve reserved a red boat I, N, T , A | I, N, T , A Sailors T 7 Ir, Br, D B, BN , C Ir, Br, D Re serves Ir I B, BN , C Boats B Br C ' red ' Find sailors who’ve reserved all boats I, N, T , A | B, BN, C I, N, T , A Sailors Boats Ir, Br, D Re serves I Ir Br B -Find the names of sailors who have reserved at least two boats 48 Assignment - 2 1. (a) Which of the following is procedural i. Relational algebra ii. Relational calculus Justify your answer with an example. (b) Compare the two relational calculi. 2. (a) What is a view? Explain the a views in SQL? (b) Discuss the problems encountered in modifying database through views. 3. (a) Explain the operations select, project Cartesian product and join with suitable examples. (b) Explain the following terms i. Atomic formula ii. Well formed formula 4. Consider the following schema given. The primary keys are underlined. Sailors(sailor-id, sailor-rating, sailor-age) Boats(boat-id, boat-name, boat-color) Reserves(sailor-id, boat-id, day) Write the queries in tuple relational calculus for the following i. Find all sailors with a rating above 10 ii. Find all boats with color blue iii. Find the names of sailors who have reserved all boats iv. Find the names of sailors who have reserved a green boat 5. Consider the following schema for a COMPANY database Employee (Name, SSN, Address, Sex, Salary, Dnumber) Department (Dname, Dnumber, MGRSSN, MGRSTART date) Dept-locations (Dnumber, Dlocations) Project (Pname, Pnumber, Plocation, Dnumber) Works-on (ESSN, Pnumber, Hours) Dependent (ESSN, Dependent-name, Sex, Bdate, Relationship) Write the queries in Relational Algebra to i. Retrieve all employees who either work in department 4 and make over 25,000 per year or work in department 5 and make over 30,000 ii. Retrieve the Social Security numbers of all employees who either work in department 5 or directly supervise a employee who works in department 5. iii. Retrieve the name and address of all employees who work for the “Research” department iv. List all the projects on which employee “Smith” is working. 6. (a) What is the difference between a candidate and the primary key for a given relation? What is a super key? (b) What is a foreign key constraint? Why are such constraints important? What is referential integrity? 7 Consider the following relations: AUTHOR (NAME, ADDRESS, AGE) PUBLISHER (NAME, ADDRESS) BOOK (TITLE,AUTHOR,PUBLISHER) Use relational algebra, Calculus to represent the following queries: (a) What are the authors and titles of all books? (b) What are the titles published by XYZ? (c) What are the names of all authors publishing book with XYZ? (d) What are the names of authors publishing at least one book with XYZ? 8 Consider the following schema given. The primary keys are underlined. PROJECT(Projectnum,Project Name, Chief Architect) EMPLOYEE( Empnum, Empname) ASSIGNED TO(Projectnum, Empnum) Write the following queries in Tuple relational calculus i. Find Empnum of employees working on project COMP454 ii. Find the details of employees(both number and name) working on the project COMP365 iii. Find the details of employees working on the ”Database” project iv. Find the Empnums of employees who do not work on project COMP464 v. Find the Empnums of employees who work on all projects