Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Κεφάλαιο 3 ΣΧΕΣΙΑΚΟ ΜΟΝΤΕΛΟ YV - Relational Model and Algebra 110 DATABASE SYSTEMS: The Relational Model and Relational Database Systems OUTLINE – Informal and Formal Definition of the Model Structures, Constraints, Operations – Relational Algebra – Relational Calculus – The languages SQL and QBE – Views - Integrity Constraints using SQL – Normalization and Relational Database Design – Relational Database Systems YV - Relational Model and Algebra 111 Relational Model: Informal Definition Proposed in 1970 by E.F. Codd (“A relational model for large shared data banks”, CACM), as a theory of a database model Spurred tremendous research in the database field and became the most popular logical data model - many relational DBMSs are today available on nearly all platforms A relational database is a set of relations RELATION: A table of values. Each column in the table has a header, called an attribute (field). Each row in the table is called a tuple (record) and stands for an entity or a relationship. YV - Relational Model and Algebra 112 Formal Definition STRUCTURES – Only one kind: relations (which have a name) A domain D is a set of values , D= {d1, d2 , ..., dn} e.g., DOMAIN OF NAMES = the set of all names DOMAIN of WEIGHT = the set of all weights CHAR STRINGS from 1 to 10 in length, etc. An attribute A names a property of interest in a relation and takes its values from some associated domain D(A). e.g., EMPLOYEE_NAME, WEIGHT, etc. Attributes are the column names (headers) in a relation (Notation: R.A, or R[A] where R is the relation name) YV - Relational Model and Algebra 113 Structure Definitions (2) A relation schema R is the name and attributes of a relation, with the underlying domains for the attributes. When obvious, the domains are ignored Notation: R(A1 , A2 , ... An) e.g., STUDENT(Name, SSN, BirthDate, Address) The degreee n of a relation R is the number of attributes in R A database schema S is a set of relation schemas. Notation: S = {R1 , R2 , ... Rm } e.g., COMPANY = { EMPLOYEE, PROJECT, ... } YV - Relational Model and Algebra 114 Structure Definitions (3) -- A tuple t of a relation R(A1 , A2 , ... An) is an (ordered) set of values t = <v1 , v2 , ... vn >, where each vi is an element of the domain D(Ai). -- A relation instance r(R), simply, relation, is a set of tuples r(R) = { t1 , t2 , ... tk } alternatively, it is a subset of the Cartesian product r(R) D(A1) x D(A2) x ... x D(An) -- The cardinality of R is the number of tuples in r(R), it is denoted by CARDR -- A relational database is a set of relations (instances) YV - Relational Model and Algebra 115 Characteristics of Relations ORDERING of attributes in a relational schema is essential ORDERING of tuples in a relation is not important Every tuple is stored only ONCE in a relation (it is a set) A value may appear MULTIPLE TIMES in a column and is considered ATOMIC (indivisible) - at times this is referred as First Normal Form (1-NF) relation A special value, called NULL, is used to represent values that are inapplicable or unknown to the database – e.g, the PhoneNumber value of someone without a phone, Address value for someone who did not supply his address Notation: component value of a tuple t, t[Ai] = vi YV - Relational Model and Algebra 116 Constraints in the Model CONSTRAINTS – Three kinds of inherent to the model constraints: KEY, ENTITY INTEGRITY, and REFERENTIAL INTEGRITY. – Three basic explicit constraints: DOMAIN, COLUMN and USER-DEFINED (some other explicit constraints, like the Functional Dependencies, will be discussed later.) KEY CONSTRAINTS: The various keys, as defined for Entities and Relationships, hold in the Relational Model. – Note that a key is a property of the relational schema (not a property of the relation) YV - Relational Model and Algebra 117 Inherent Constraints (1) – A set of attributes SK of a relation schema R for which each tuple in any relation instance r(R) must have unique value(s) is a superkey. That is, for distinct t1 and t2, t1[SK] t2[SK] For instance, SSN of EMPLOYEE, NAME and ADDRESS of EMPLOYEE, SSN and NAME of EMPLOYEE, etc. – A candidate key K is a minimal superkey (that is, no subset of the attributes in K is a superkey). K is also called key. For instance, SSN is a candidate key for EMPLOYEE, but the combination {SSN, NAME} is not. – A primary key PK is one of the candidate keys that is agreed to serve as an identifier for the relation (primary keys are usually distinguished by underlining) For instance, SSN is the primary key of relation EMPLOYEE. YV - Relational Model and Algebra 118 Inherent Constraints (2) ENTITY INTEGRITY: The primary key attributes PK in a relation schema R cannot have NULL values in any tuple of a relation instance r(R). t[PK] NULL, for all t in r(R) – The reason behind the above constraint is that a primary key is used to identify a tuple in a relation. – Note that more attributes in R may be constrained to have no NULLS by explicit constraints. YV - Relational Model and Algebra 119 Inherent Constraints (3) REFERENTIAL INTEGRITY: These constraints involve TWO relations and are used to specify a relationship among tuples of the two relations. They are also called foreign keys. – A foreign key FK is a set of one or more attributes of a relation R1 that forms a primary key for another relation R2. A tuple t1 in r(R1) is said to reference tuple t2 in r(R2), IF t1[FK] = t2[FK] For instance, for the relation WORKING-ON the attribute SSN is a foreign key (it is the primary key of EMPLOYEE). YV - Relational Model and Algebra 120 Explicit Constraints (1) DOMAIN CONSTRAINTS: They are the rules defined in the domain definition and inherited by columns (attributes) based on that domain. – A domain can be defined, together with all its integrity rules (e.g., the domain of integers having all rules that apply to integers). They are usually the basic data types. – The ideal support is through strong data typing (very rare) COLUMN CONSTRAINTS: They are additional to the domain constraints and maintain values in a column. – Column rules go beyond the rules inherited by the domain (e.g., the column of small integers or integers between 1 and 10, etc. that further restrict the domain of integers.) – In many systems, support is given with a CHECK option YV - Relational Model and Algebra 121 Explicit Constraints (2) USER-DEFINED CONSTRAINTS: Any integrity rule, not among the ones discussed before, is classified as userdefined. To enforce certain business rules, integrity constraints of arbitrary complexity are required. Such constraints are expressed either procedurally or declaratively (preferred way) Several mechanisms can be used to implement the enforcement of such rules: stored procedures, triggers, methods (for object-oriented systems) Generally, relational DBMS are weak in enforcing rules YV - Relational Model and Algebra 122 The COMPANY Database in the Relational Model - Schema EMPLOYEE ( SSN, Name, BirthDate, Address, Sex, Salary, SupSSN, DNumber) DEPARTMENT ( DNumber, DName, MgrSSN, MgrStartDate) PROJECT ( PNumber, PName, Location, DNumber) DEPT_LOCATION ( DNumber, DLocation ) WORKS_ON ( SSN, PNumber, HoursPW) DEPENDENT ( SSN, DependName, Sex, BirthDate, Relationship) YV - Relational Model and Algebra 123 The COMPANY Database in the Relational Model - Instance . EMPLOYEE SSN 1234 3344 9998 9876 6668 4534 9879 8886 Name john frank alice jenny rama joyce jack james BDate 9.1.55 8.9.45 7.6.50 2.6.41 5.8.56 3.7.62 2.3.59 1.1.40 Address Sex kifisia m athina m ekali f patra f korinth m kiato f maroussi m psihico m Salary 30000 55000 25000 43000 38000 25000 25000 60000 SupSSN 3344 8886 9876 8886 3334 3334 9876 N U LL DNumber 5 5 4 4 5 5 4 1 DEPT_LOCATION DEPARTMENT DNumber 5 4 1 DName MgrSSN research 3334 admin 9876 headqrtr 8886 YV - Relational Model and Algebra MgrSD 22.3.78 14.1.85 28.6.73 DNumber 1 4 5 5 5 DLocation kolonaki zografou gkizi patisia kolonaki 124 The COMPANY Database in the Relational Model - Instance PROJECT WORKS_ON . SSN 1234 1234 3344 3344 3344 3344 9998 9998 9876 9876 6668 4534 4534 9879 9879 8886 PNumber 1 2 2 3 10 20 30 10 30 20 3 1 2 10 30 20 HoursPW 32 7 10 10 10 10 30 10 20 15 40 20 20 35 5 N U LL YV - Relational Model and Algebra PNumber 1 2 3 10 20 30 PName prodx prody prodz computer reorganize benefits Location gkizi patisia kolonaki zografou kolonaki zografou DNumber 5 5 5 4 1 4 DEPENDENT ESSN 3334 3334 3334 9876 1234 1234 1234 Name aliki theo joy abner mike aliki elisa Sex f m f m m f f BDate 1976 1973 1948 1932 1978 1979 1957 Relation daught son spouse spouse son daught spouse 125 Definition of the Model: Operations OPERATIONS – We can distinguish them in (a) UPDATE, (b) RETRIEVAL UPDATE operations on Relations – INSERT a tuple – DELETE a tuple – MODIFY a tuple Integrity constraints should not be violated by the execution of any update operation. For this, updates may propagate to cause other updates automatically. – e.g., when a tuple of EMPLOYEE is deleted, all tuples in WORKING_ON which have the same value for SSN are also deleted (non-existent employees cannot work on projects!) YV - Relational Model and Algebra 126 Operations: Relational Languages RETRIEVAL operations on Relations – There are two flavors: – (a) RELATIONAL ALGEBRA -- somewhat procedural, tells how to compute the result – (b) RELATIONAL CALCULUS -- somewhat declarative, tells what properties the result should have No database system supports the two flavors of retrieval languages in their pure forms. This is because the issues of “ease of use”, “convenience”, etc., play an essential role in user interaction. Yet, the languages supported in DBMSs have their roots in either relational algebra or calculus. YV - Relational Model and Algebra 127 Relational Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: – Strong formal foundation based on logic. – Allows for much optimization. Query Languages != programming languages! – QLs not expected to be “Turing complete”. – QLs not intended to be used for complex calculations. – QLs support easy, efficient access to large data sets. YV - Relational Model and Algebra 128 Relational Algebra Operations RELATIONAL ALGEBRA A set of operators each of which maps one or more relations into a new relation (the algebra is CLOSED). The operators, just as in arithmetic algebra, can be nested, since the result of each operation is itself a relation. There are two types of operators: – traditional (regular) set operators » union, intersection, difference, ... – database specific set operators » projection, selection, join, ... YV - Relational Model and Algebra 129 Relational Algebra: Set Ops Traditional Set Operators – Union, Intersection, Difference, Cartesian Product – For the first three operators to apply, we must have UNION COMPATIBILITY between the two operand relations. That is: R1 ( A1 , A2 , ... , An ) and R2 ( B1 , B2 , ... , Bn ) must have the same number of attributes and the domains of the corresponding attributes must be compatible, i.e., D(Ai) = D(Bi) , for i = 1, 2, ..., n – By convention, the resulting relation for these operators has the same attribute names as the first operand relation R1 YV - Relational Model and Algebra 130 Relational Algebra : Example Database Database Schema STUDENT (SName, SAge) , INSTRUCTOR (IName, IAge) relation schema R relation schema S Database Instance R SName mary jack paul barb john (STUDENT) SAge 20 22 25 22 24 YV - Relational Model and Algebra S (INSTRUCTOR) IName helen paul barb kris IAge 22 25 22 24 CARDR = 5 CARDS = 4 131 Relational Algebra: Set Ops (2) UNION - Put all the tuples of two relations in one relation – Notation: R S CARDR S <= CARDR + CARDS – Formally: R S = { t | t is in R or t is in S } – Example: STUDENT INSTRUCTOR SName mary jack paul barb john SAge 20 22 25 22 24 YV - Relational Model and Algebra IName helen paul barb kris IAge 22 25 22 24 = SName mary jack paul barb john helen kris SAge 20 22 25 22 24 22 24 132 Relational Algebra: Set Ops (3) INTERSECTION - Put the common tuples of two relations in one relation – Notation: R S CARDR S <= max(CARDR , CARDS ) – Formally: R S = { t | t is in R and t is in S } – Example: STUDENT INSTRUCTOR SName mary jack paul barb john SAge 20 22 25 22 24 YV - Relational Model and Algebra IName helen paul barb kris IAge 22 25 22 24 = SName paul barb SAge 25 22 133 Relational Algebra: Set Ops (4) SET DIFFERENCE - Select the tuples of the first relation which are not members of the second relation CARDR -S <= CARDR – Notation: R - S – Formally: R - S = { t | t is in R and t is not in S } – Example: STUDENT - INSTRUCTOR SName mary jack paul barb john SAge 20 22 25 22 24 YV - Relational Model and Algebra - IName helen paul barb kris IAge 22 25 22 24 = SName mary jack john SAge 20 22 24 134 Relational Algebra: Set Ops (5) CARTESIAN PRODUCT - Combine each tuple of one relation with each tuple of the other CARDR 5 S = CARDR x CARDS – Notation: R 5 S – Formally: R 5 S = { t | t is the concatenation of a tuple in R with a tuple in S } – Example: STUDENT 5 INSTRUCTOR SName mary mary mary mary jack jack jack jack paul SAge 20 20 20 20 22 22 22 22 25 Iname helen paul barb kris helen paul barb kris helen YV - Relational Model and Algebra IAge 22 25 22 24 22 25 22 24 22 paul paul paul barb barb barb barb john john john john 25 25 25 22 22 22 22 24 24 24 24 paul barb kris helen paul barb kris helen paul barb kris 25 22 24 22 25 22 24 22 25 22 24 135 Relational Algebra Operators: SELECTION SELECTION - Selects the subset of tuples of a relation that satisfy a certain condition (qualification) c , which is an arbitrary Boolean expression on the attributes of R (“horizontal” subset of R) – Notation: sc (R) or R[c] – Formally: sc (R) = { t | t is in r(R) and condition c holds for t } – Examples: sDNumber = 4 (EMPLOYEE), sSalary>30000 (EMPLOYEE) s(Salary>30000 AND DNumber = 4 ) OR DNumber = 5 (EMPLOYEE), EMPLOYEE [ Dnumber = 4 ], EMPLOYEE [ Salary > 30000 ] YV - Relational Model and Algebra 136 Relational Algebra Operators: SELECTION (2) – Selection is both commutative and associative (a) sc1 ( sc2 (R) ) = sc2 ( sc1 (R) ) (b) sc1 ( sc2 (R) ) = sc1 AND c2 (R) = sc1 , c2 (R) (c) sc1 ( sc2 ( sc3 (R) ) ) ) = sc2 ( sc3 ( sc1 (R) ) ) ) – Example Result: sDNumber = 4 (EMPLOYEE) All Employees in department number 4 SSN 9998 9876 9879 Name alice jenny jack YV - Relational Model and Algebra BDate 7.6.50 2.6.41 2.3.59 Address Sex ekali f patra f maroussi m Salary 25000 43000 25000 SupSSN 9876 8886 9876 DNumber 4 4 4 137 Relational Algebra Operators: PROJECTION PROJECTION - Keeps only certain attributes (specified by a list L) and eliminates the other attributes of a relation R and also all duplicate tuples (“vertical” subset of R) – Notation: pL (R) or R[L] – Formally: pL (R) = { t[L] | t is in r(R) and L R } – Example: pLocation (PROJECT), or PROJECT[Location] Location gkizi patisia kolonaki zografou YV - Relational Model and Algebra All Locations where projects are 138 Relational Algebra Operators: JOINS There are several types of JOINs - all combining two relations to form a new one: – (theta) join, equality join, natural join, semi-join, outer join THETA (CONDITION) JOIN: Connect tuples from two relations that match (satisfy a Boolean condition c) on certain attributes – A theta-join is equivalent to a Cartesian product followed by a selection on the condition c. – Notation: R or R[c]S c S – The resulting relation has ALL the attributes of R and of S R c S s c (R S) YV - Relational Model and Algebra 139 Relational Algebra Operators: THETA JOIN – Example: DEPARTMENT MgrSSN > MgrSSN DEPARTMENT All department-department combinations where the first department’s number is greater than the second’s DNumber 4 4 1 DName MgrSSN admin 9876 admin 9876 headqrtr 8886 YV - Relational Model and Algebra MgrSD 14.1.85 14.1.85 28.6.73 DNumber 1 5 5 DName MgrSSN headqrtr 8886 research 3334 research 3334 MgrSD 28.6.73 22.3.78 22.3.78 140 Relational Algebra Operators: EQUALITY JOIN EQUALITY JOIN: Connect tuples from two relations that match (have equal values) on certain attributes. This is exactly like THETA JOIN, except that the condition c is only allowed to have equalities. – Notation: R c S or – Example: WORKS_ON SSN 9879 9879 9879 PNumber 30 30 30 HoursPW 5 5 5 R[c]S HoursPW = DNumber PROJECT PNumber 1 2 3 PName prodx prody prodz Location gkizi patisia kolonaki DNumber 5 5 5 A totally MEANINGLESS Relation YV - Relational Model and Algebra 141 Relational Algebra Operators: NATURAL JOIN NATURAL JOIN: Connect tuples from two relations that match (have equal values) on all common attributes. In the result, the common attributes are kept only once – Notation: R S or – Example: DEPARTMENT DNumber 5 5 5 4 1 YV - Relational Model and Algebra DName research research research admin headqrtr R[X=X]S MgrSSN 3334 3334 3334 9876 8886 DEPT_LOCATION MgrSD 22.3.78 22.3.78 22.3.78 14.1.85 28.6.73 DLocation gkizi patisia kolonaki zografou kolonaki 142 Relational Algebra Operators: SEMI--JOIN SEMI-JOIN: Select the subset of one relation that joins with another. A semi-join is equivalent to a join followed by a projection. – Notation: R c S or R<c]S – Example: EMPLOYEE SSN=MgrSSN DEPARTMENT SSN 3344 9876 8886 Name frank jenny james BDate 8.9.45 2.6.41 1.1.40 Address athina patra psihico Sex m f m Salary 55000 43000 60000 SupSSN 8886 8886 N U LL DNumber 5 4 1 Semi-joins are USEFUL in distributed database operations YV - Relational Model and Algebra 143 Relational Algebra Operators: OUTER--JOIN Motivation: In a regular join operation, tuples in relations R or S that do not have matching tuples in the other relation do not appear in the result. In some queries, all tuples in R (or S) must appear in the result - when no matching tuples are found, NULLs are placed for the missing attribute values. – Notation: RS OUTER-JOINs are distinguished in: – Left outer join – Right outer join – Full outer join YV - Relational Model and Algebra (all tuples in R appear in the result) (all tuples in S appear) (all tuples in R and S appear) 144 Relational Algebra Operators: DIVISION DIVISION: Given relations R(X,Y) and S(Y), where X, Y are sets of attributes, a tuple t is a member of the division (denoted: (R / S)[X] ) IF for all tS in S there exist tR in R, such that: tR [Y] = tS [Y] and tR [X] = t [X] – Analogy with number arithmetic: The quotient q of a/b is the largest number s.t. qb <= a The quotient Q of R S is the maximal relation s.t. Q X S R YV - Relational Model and Algebra 145 Relational Algebra Queries (1) A series of queries in relational algebra are presented in the sequel, using an example relational database that involves SAILORS who RESERVE some BOATS. SAILORS (Sid, SName, Rating) BOATS (Bid, BName, Color) RESERVE (Sid, Bid, Date) YV - Relational Model and Algebra 146 Relational Algebra Queries (2) QUERY1: Find the names of sailors who have reserved boat number 2 ( RESERVE [Bid=2] [Sid=Sid] SAILORS ) [SName] pSName ( sBid=2 RESERVE Sid=Sid SAILORS ) QUERY2: Find the names of sailors who have reserved a red boat ( BOAT [Color=red] [Bid=Bid] RESERVE [Sid=Sid] SAILORS ) [SName] pSName ( sColor=red BOAT YV - Relational Model and Algebra Bid=Bid RESERVE Sid=Sid SAILORS ) 147 Relational Algebra Queries (3) QUERY3: Find the colors of the boats reserved by eleni (SAILORS [SName=eleni] [Sid=Sid] RESERVE [Bid=Bid] BOATS) [Color] pColor ( sSName=eleni SAILORS Sid=Sid RESERVE Bid=Bid BOATS ) QUERY4: Find the names of the sailors who have reserved at least one boat ( RESERVE [Sid=Sid] SAILORS ) [SName] pSName ( RESERVE YV - Relational Model and Algebra Sid=Sid SAILORS ) 148 Relational Algebra Queries (4) QUERY5: Find the names of sailors who have reserved a red or a green boat pSName ( (sColor=red BOATS sColor=green BOATS ) pBid=Bid RESERVE Sid=Sid SAILORS ) QUERY6: Find the names of sailors who have reserved both a red and a green boat RESERVE ) RESERVE ) ) BOATS pSName ( ( pSid ( sColor=red BOATS pSid ( sColor=green SAILORS ) YV - Relational Model and Algebra Bid=Bid Bid=Bid Sid=Sid 149 Relational Algebra Queries (5) QUERY7: Find the names of sailors who have reserved all boats pSName ( ( pSid, Bid RESERVE / pBid BOATS ) Sid=Sid SAILORS ) QUERY8: Find the names and ratings of sailors who have reserved all red boats pSName, Rating ( (pSid, Bid RESERVE / pBid (sColor=red BOATS ) ) Sid=Sid SAILORS ) YV - Relational Model and Algebra 150 Relational Algebra: Comments There are several properties that hold in a relational algebra expression (commutatitivity, associativity, etc.) – Examples: sc1 ( pL (R) ) = pL ( sc1 (R) ) c2 S sc1 ( R c2 S ) = sc1 ( R ) sc1 (R S) = sc1(R) sc1(S) .... Such properties are very useful in query optimization YV - Relational Model and Algebra 151 Relational Algebra: Comments COMPLETE SET OF OPERATIONS – The set of operators {sp-5} is called a complete set of relational algebra operations. The implication is that ALL other operators can be described as a sequence of the above operators. – For example, the division operator can be described as: R / S = pX(R) - ( (pX (R) 5 S) - R ) where X are the non-common attributes in R and S – Equivalently, it is expressed as: (R / S) [X] = R[X] - ( ( R[X] x S ) - R )[X] YV - Relational Model and Algebra 152 Relational Algebra Competeness There are several combinations of relational algebra operators that define a complete set. Any Query Language equivalent to a complete set of operations is called RELATIONALLY COMPLETE – NOTE: This does not imply that the language is adequate to do all database operations (e.g., a good language must support aggregates, many forms of joins, built-in functions,...) An interesting operator -which goes beyond the expressive power of the relational set of operators as defined by Codd- is that of transitive closure. This is a form of recursion in relational databases and is very useful in many applications. YV - Relational Model and Algebra 153