* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download cs610a - Arms-A-Flailing | Contain Yourself!
Survey
Document related concepts
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational algebra wikipedia , lookup
Transcript
CS610 / CS710 Database Systems I Daisy Wong 1 © D. Wong 2003 2002 Week 1 2 © D. Wong 2003 2002 Objectives Modeling and design of databases. Programming: queries and DB operations 3 © D. Wong 2003 2002 Database A large collection of data: –stored in mass storage –exists over a long period of time –Can take on a variety of appearances depending on the requirements at the time –Can serve as the data source for a variety of applications 4 © D. Wong 2003 2002 Database Examples Examples: – Wal-Mart : records every item purchased in every store – L. L. Bean: records detail information of each customer and their purchases – Hospitals: record patient demographics, conditions and progress, test results, etc –... This course, refer to a collection of data managed by a Database Management System (DBMS) 5 © D. Wong 2003 2002 DBMS A combination of software, data, and structure of the data that support: – Users to create databases and specify their schema – Users to query and modify the data – Storage of very large amount of data, secure the data from accident or unauthorized use, and allow efficient access – Control concurrent access from many users, presenting correct data to each user, and prevent accidental corruption of the data from simultaneous accesses 6 © D. Wong 2003 2002 DBMS lingo Schema – logical structure of the data. Use Data Definition Language (DDL) Query – A question about the data. Use Data Manipulation Language (DML) Transaction (or atomic transaction) – A logical unit of work that must be completed as a whole or not at all. 7 © D. Wong 2003 2002 Three levels of database schemas External View1 External View2 External View3 External level Logical to external mappings Logical Schema Logical level Internal to Logical mappings Disk Internal schema 8 Internal level © D. Wong 2003 2002 DBMS User Roles End users – access the database for information to do their jobs. Casual (use simple user interfaces) vs. Sophisticated ( use DML) Database designers – specify information content (use DDL) to create database systems Application developers – design and develop applications that extend the functionality of the dbms. E.g. user interface, data analysis and data mining, various business services Database administrators (DBAs) – administer databases: control access, maintain data accuracy and integrity, monitor and improve database 9 © D. Wong 2003 2002 performance A little history First attempt – file systems Hierarchical model (tree based) Network model (graph base) Relational model – Proposed by E. F. Codd (1970) – Data should be presented to user as tables (relations) – Queries expressed in a very high-level language – SQL (Structured Query Language) – most important language based on relational model 10 © D. Wong 2003 2002 Relational Model A conceptual model that represents data as relations. Relations – tables of data Accounts accountNo 12345 34567 … name Sally Sue … balance 1000.21 285.48 … Query using SQL: SELECT balance FROM Accounts WHERE accountNo = 34567; Relational DBMS finds an efficient way to answer the query 11 © D. Wong 2003 2002 Major components of a DBMS (Simplified. Figure 1.1) External level Logical level Users / applications DBA DDL commands DDL compiler Queries/updates Query Manager Transaction Commands Transaction Manager Buffer/File Storage Manager Internal level Data Metadata 12 © D. Wong 2003 2002 Figure 1.1: Database management system components Ref. FCDS 2ed. by Ullman 13 © D. Wong 2003 2002 Storage Manager • Obtains the requested information from data storage • Modifies the information if requested and re-store • Indexes are used (data structures that help us find data items quickly given a part of their value). Advanced data structures such as B-tree are used for efficient access • Indexes are part of the data, their description is part of the metadata • Consists of 2 components: file manger and buffer manager 1. File manager keeps track of the location of files on disk and obtains the blocks containing the requested data 2. Buffer manager handles main memory. It manages the memory blocks, obtaining disk blocks from the file manager, trying to optimize the access to data 14 © D. Wong 2003 2002 Query Manager Parse and optimize the query using a query compiler Execute the resulting query plan (sequence of actions for the DBMS to perform) – Issues a sequence of requests to storage manager for small pieces of data Return the result to the requester 15 © D. Wong 2003 2002 Transaction Manager Assure all transactions are executed properly ACID properties of “proper” execution: – Atomicity : All of the updates of a transaction are successful, or no update take place – Consistency: Each transaction should leave the database in a consistent state – Isolation: Each transaction, when executed concurrently with other transactions, should have the same effect as if it had been executed by itself – Durability: Once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction. 16 © D. Wong 2003 2002 Techniques to enforce ACID Locking – granularity of locks is important. Logging – write a log to nonvolatile storage. Assure durability. Transaction Commitment – for durability and atomicity, transactions are computed “tentatively”, recorded, but no changes are made to the db until the transaction gets committed. Changes copied to the log, then copied to db. 17 © D. Wong 2003 2002 Trends Object Oriented DB – Richer data types – Classes and class hierarchy enable share or reuse of sw and schemas – Protect misuse through abstract data types Object Relational DB Constraints and triggers handling Multimedia data Data Integration to support advance data analysis such as data mining. E.g. data warehouses, data marts Multi-tier Client-Server architecture, move more processing to the client Parallel processing Support Web sites 18 © D. Wong 2003 2002 Knowledge Discovery in Databases (KDD) Data Mining Interpretation / Evaluation Transformation Preprocessing Knowledge Selection Patterns Preprocessed Data Data Transformed Data Target Data Reference: Fayyad; Smyth: "From Data Mining to Knowledge Discovery: An Overview" 1996 19 © D. Wong 2003 2002 Ch. 2 Entity-Relationship Data Model Data models Entity-Relationship diagrams Design Principles Modeling of constraints Weak entity sets 20 © D. Wong 2003 2002 Data Modeling Used for conceptual database design Object-oriented Classes / Objects DBMS ODL Ideas Relational Schema Relational DBMS E/R 21 © D. Wong 2003 2002 Data Models Value-Oriented Examples Object-Oriented –Relational –Object-oriented –Logic –Network –Hierarchical Distinct objects by Data values Redundancy handling Reduced by Design Use pointers to capabilities objects Many-to-many relationships Handled the same 22 Object identity Only binary relationships © D. Wong 2003 2002 Relational Model Based on mathematically defined relations of entities Consists of: – Attributes (fields) – Domain legitimate values of attributes (data range) – Views of data presented in table format – Create new views by projections of the database – Records are n-tuples, where n = # of attributes 23 © D. Wong 2003 2002 Entity-Relationship (E/R) Diagrams Represents the schematic of a database Useful for designing the conceptual model Entity set : classes of objects Entity : member of an entity set (data) Attributes : properties of the entities in an entity set Relationship – describe how entities relate to each other 24 © D. Wong 2003 2002 Entity-Relationship Diagrams Symbols attributes relationships entity sets one - one Cardinality: m m many - one n 25 many - many © D. Wong 2003 2002 Entity-Relationship Diagrams (continue) Entity – Has a set of attributes – The key is underlined SS# Key – Attributes which uniquely identify an entity in an entity set – An inherent property of the data (e.g. movie title and year) – Serve as a constraint 26 © D. Wong 2003 2002 Week 2 27 © D. Wong 2003 2002 Ch. 2 Entity-Relationship Data Model (continue) Data models Entity-Relationship diagrams Design Principles Modeling of constraints Weak entity sets 28 © D. Wong 2003 2002 Entity-Relationship Diagrams Symbols attributes relationships entity sets Cardinality (Multiplicity): one - one m m many - one n 29 many - many © D. Wong 2003 2002 Entity-Relationship Diagrams (continue 1) Entity – Has a set of attributes – The key is underlined SS# Key – Attributes which uniquely identify an entity in an entity set – An inherent property of the data (e.g. movie title and year) – Serve as a constraint 30 © D. Wong 2003 2002 Entity-Relationship Diagrams (continue 2) Multiway relationships Relationships with roles Attributes on relationships – E.g. salary of a star for a movie – Not necessary by adding new entity set, whose entities have the attributes 31 © D. Wong 2003 2002 E/R Diagrams Multiway relationships Can be converted to a collection of binary, manyone relationships – Introduce new entity set (connecting entity set) Entities are tuples of the relationship set for the multiway relationship – Introduce many-one relationships from the connecting entity set to each of the entity sets that provide components of tuples in the original, multiway relationship 32 © D. Wong 2003 2002 ER Diagrams –Inheritance Subclasses – special-case entity sets with own special attributes and/or relationships isa relationship : One-One relationship to handle inheritance, to connect entity set to its subclasses. – E.g. every Cartoon is a Movie Super class ER symbol for isa : (arrows not drawn) isa Subclass 33 © D. Wong 2003 2002 Design Principles Faithfulness : Reflects the reality of the data and the problem with the appropriate: – attributes – data ranges – relationships – cardinality E.g. Stars Stars -in 34 Movies © D. Wong 2003 2002 Design Principles (2) Avoid Redundancy – Redundancy : duplicate representation of data, lead to Inconsistency, Waste violate data integrity space E.g. Studios Owns Studio_name Movies Studio_name 35 © D. Wong 2003 2002 Design Principles (3) Keep it simple : Avoid more elements than needed Pick the right kind of element – attributes or entity set – what should the relationships be 36 © D. Wong 2003 2002 DB design Centers on the entity sets and relationships making up the database. Users and designers must work together to: – Specify the entity sets and relationships and their meanings – Specify the attributes and the meaning of it – Specify the constraints – Identify key attributes of entity sets – Specify how the data would be viewed – Specify who can view and modify what and when Better to do a good job in the original design – Need collaboration of users and designers Reminder: Changing database design later usually very 37 © D. Wong 2003 2002 costly. Constraints Constraints are part of the schema, not of an instance of the database. Classification of Constraints – Keys – Single-value constraints – Referential integrity constraints – Domain constraints – General constrains 38 © D. Wong 2003 2002 Keys in the E/R Model Key for an entity set E is a set K of one or more attributes such that, given any two distinct entities e1 and e2 in E, e1 and e2 cannot have identical values for each of the attributes in the key K. (Ref. 2.3.2) – E.g. Movie(title, year, length, filmtype), E1 = (titanic, 1960, 120, color), E2 = (titanic, 1997, 180, color) – A key can consist of more than one attribute – There can be more than one possible key (candidate keys), but designate one as the “primary key”. – Attributes that constitute the primary key cannot be null – If the entity set is in an isa-hierarchy, require the root entity set have all the attributes needed for a key. – In E/R diagram, attributes of a key are underlined 39 © D. Wong 2003 2002 Single-Value Constraints Assertion: at most one value exists in a given context Ways to express single-value constraints in E/R: – Each attribute of an entity set has a single value Some attributes may allow null Key attributes should not be null No formal representation in E/R diagram, make a notation beside the attribute – Many-one relationship E R 40 F © D. Wong 2003 2002 Referential Integrity Assertion: exactly one value exists in a given role Enforced at database implementation – Forbid the deletion of a referenced entity – Require that if referenced entity is deleted, then all entities that reference it are deleted as well. Use a rounded arrow in the diagram Movies Owned by 41 Studios © D. Wong 2003 2002 Other kinds of constraints Domain constraints : restrict the value of an attribute to be in a limited set. – No formal notation in E/R, place a notation next to the attribute – E.g. declaring the type of an attribute – E.g specify range for the attribute value General constraints – E.g. degree of the relationships Movies Stars-in 42 <=10 Stars © D. Wong 2003 2002 Weak Entity Sets Definition: An entity set whose key is composed of attributes some or all of which belong to another entity set. Causes: 1. entity sets in a hierarchy based on classification Crew Unit-of number Studios name addr 2. Connecting entity sets to eliminate a muliway relationship. 43 © D. Wong 2003 2002 Weak Entity Sets Requirements 1. 2. 3. Key consists of zero or more of its own attributes, and Key attributes from entity sets that are reached by supporting relationships (many-one) R is a supporting relationship for a weak entity set E to some entity set F, R must: – Be a binary, many-one relationship from E to F – R must have referential integrity from E to F – The attributes that F supplies for the key of E must be key attributes of F – If F is weak, then key attributes of F supplied to E may be attributes of some entity set to which F is connected by a supporting relationship (recursive) – For multiple supporting relationship from E to F, each relationship is used to supply a copy of the key attributes of F 44 © D. Wong 2003 2002 Week 3 45 © D. Wong 2003 2002 Ch. 3 (part 1) Relational Model basics From E/R diagram to Relations 46 © D. Wong 2003 2002 Relational Model Based on mathematically defined relations of entities Consists of: – Attributes (fields) – Domain legitimate values of attributes (data range) – Views of data presented in table format – Create new views by projections of the database – Records are n-tuples, where n = # of attributes (arity) Use relational algebra operations to compute data Result of the operations are also relations 47 © D. Wong 2003 2002 Relations Sets of tuples presented in a table (order of tuples immaterial) Attributes are the column header Schema of a relation: name and the set of attributes. E.g. Movies (title, year, length, filmType) Database schema – the set of schemas for the relations in a design Domain – elementary data type of each attribute Relation instances Movies title year length filmType Star Wars 1977 124 color Mighty Ducks 1991 104 color Wayne’s World 1992 48 95 color © D. Wong 2003 2002 E/R Diagram to Relational Designs 1. Turn each entity set into a relation with the same set of attributes. E.g. Star (name, address) 2. Represent relationships by relations 3. Combining relations when appropriate 4. Handle weak entity sets 5. Convert subclass structures to relations 49 © D. Wong 2003 2002 E/R Relationship to Relations Replace a relationship by a relation (say R): 1. For each entity set involved in relationship R, take its key attribute(s) as part of the schema of the relation for R. 2. If the relationship has attributes, add to R’s attribute set 3. If one entity set has multiple roles in a relationship, then its key attributes appear as many times as there are roles. Need to rename the attributes to avoid name duplication. e.g. Contracts (starName, title, year, studioOfStar, producingStudio) Ref. Fig. 3.6 50 © D. Wong 2003 2002 Discovering Keys for Relations If relation R is constructed from: Entity set – key attributes of the entity set Relationship R, key of R is : – many-many : composed of keys of both connected entity sets – many-one from E1 to E2 : consisted of key of E1 – one-one between E1 and E2: either the key of E1 or E2 51 © D. Wong 2003 2002 Combining Relations In general, if 2 or more relations has exactly the same keys, combine them. For many-one relationship: – E (ea1, ea2) – F (fa1, fa2, fa3) – R (ea1, fa2, ra1) -- key is from the many relation – Result: E(ea1, ea2, fa2, ra1) E F R 52 © D. Wong 2003 2002 Handling weak entity set 1. The relation for the weak entity set includes it’s own attributes and the attritubes of the other entity sets that help form it’s key e.g. 1: Contracts (starName, studioName, title, year, salary) e.g. 2: Crew (number, studioName) 2. 3. The relation for any relationship create as usual (rename attributes to avoid confusion) Don’t need to create a relation for supporting relationships 53 © D. Wong 2003 2002 Weak entity set handling example Studios(name, addr) Crews(number, studioName) Unit-of(number, studioName, name) – but not needed because it’s a supporting relationship Crew Uint-of number Studios name 54 addr © D. Wong 2003 2002 Converting Subclass Structures to Relations Strategies: 1. E/R approach 2. OO approach 3. Use null values 55 © D. Wong 2003 2002 E/R approach For each entity set E in the hierarchy, create a relation that includes the key attributes from the root and any attributes belonging to E Do not create a relation for isa relationship Example: – Movies(title, year, length, filmType) – MurderMysteris(title, year, weapon) – Cartoons(title, year) 56 © D. Wong 2003 2002 OO approach Treat entities as objects belonging to a single class – for each possible subtree including the root, create one relation, whose schema includes all the attributes of all the entity sets in the subtree Example: – Movies(title, year, length, filmType) – MoviesC(title, year, length, filmType) – MoviesMM( title, year, length, filmType, weapon) – MoviesCMM(title, year, length, filmType, weapon) – Voices(title, year, starName) Then combine relations that have the same set of attributes. 57 © D. Wong 2003 2002 Use null values Create one relation with all the attributes of all the entity sets in the hierarchy. Each entity is represented by one tuple, and that tuple has a null value for whatever attributes the entity does not have. Example: – Movie(title, year, length, filmType, weapon) 58 © D. Wong 2003 2002 Comparison of the 3 approaches Pros and Cons in each approach. Factors to consider: – Number of relations involved in a query (depends on the query of interest) – Total number of relations in the schema – Space usage 59 © D. Wong 2003 2002 Operations in the Relational Data Model Algebraic notation (Relational Algebra) sets bags Constraints on relations Extensions to the relational model 60 © D. Wong 2003 2002 Ch. 5 Relational Algebra Relational Algebra expressions are composed of: – Operands - relations represented by: relation’s name or the list of tuples in a relation – Operators Results of the operations are also relations Build complex expressions from simple ones Queries – relational algebra expressions 61 © D. Wong 2003 2002 Classes of Relational Algebra Operations Classes of relational algebra operations: 1. Set operations: union, difference, intersection 2. Remove parts of a relation: – Selection – eliminates rows (tuples) – Projection – eliminates columns 3. Combine tuples of two relations : cartesian product and joins 4. Renaming: changes relation schema (i.e. relation name, and/or attribute names) 62 © D. Wong 2003 2002 Example relations 1 R S A B C a b c d a f c b d A B C b g a d a f 63 © D. Wong 2003 2002 Set Operations Union, Difference, Intersection Given relations R and S: – R and S have schemas with identical set of attributes, and domain types for each attributes (=> same arity) – Columns of R and S must be ordered so that the order of attributes is the same for both relations 64 © D. Wong 2003 2002 Union RS – Result: a set of tuples that are in R or S or both. – A tuple appears only once in the result even if it is present in both R and S – Reminder: R and S has same arity Example: A a d c b B b a b g 65 C c f d a © D. Wong 2003 2002 Difference R–S – Result: set of tuples in R but not in S – Reminder: Not commutative: R – S ≠ S – R – Reminder: R and S has same arity Example: A B C a b c c b d 66 © D. Wong 2003 2002 Intersection RS – Result: set of tuples in both R and S, i.e. R-(R-S) – Reminder: R and S has same arity Example: A B C d a f 67 © D. Wong 2003 2002 Projection A1, A2, …, An( R ) Result: – a new relation that has only the columns for attributes A1, A2, … An of R – Schema of the new relation is {A1, A2, … An} Example 1: A, C( R ) Example 2: B( R ) A C a c d f c d B b a 68 © D. Wong 2003 2002 Selection sc( R ) , where C is a conditional expression: – operands are constants or attributes of R – Operators are : <, =, >, , , , and, or, not Result: – a new relation with a subset of R’s tuples that satisfy condition C – Schema is the same as R’s schema Example: sB=b(R) A B C a b c c b d 69 © D. Wong 2003 2002 Cartesian Product RxS Let arity of R = k1 and arity of S = k2 Result: the set of all possible (k1 + k2)-tuples whose first k1 components form a tuple in R, and whose last k2 components form a tuple in S Example: R.A R.B R.C S.A S.B S.C a b c b g a a b c d a f d a f b g a d a f d a f c b d b g a c b d d a f 70 © D. Wong 2003 2002 Example Relation 2 R A B C 1 2 4 7 D E 3 3 1 5 6 6 2 8 9 71 S © D. Wong 2003 2002 Theta-Joins R Result: tuples in R x S such that C is satisfied, i.e. C S where C is an arbitrary condition sC(R x S) To construct the result: 1. Take the product of R and S 2. Select from the product only those tuples that satisfy the condition C Arity of result = arity of R + arity of S Example: R A 1 1 4 B 2 2 5 C S where C is B < D C 3 3 6 72 D 3 6 6 E 1 2 2 © D. Wong 2003 2002 Example Relation 3 R A B C a b d S B C D c b c d b c b c e b b f a d b c a d 73 © D. Wong 2003 2002 Natural Joins R S Result: i1, i2, …, im sR.A1=S.A1 … R.Ak=S.Ak (R x S) where i1, i2, …, im is the list of all components of R x S in order, except the components S.A1, … S.Ak Example: A a B b C c D d a d b b c c e d d c b a c d e b 74 © D. Wong 2003 2002 Renaming S(A1, A2, …, An)(T) or S(T) Result: renames relation T to S Example: R x S(X, Y, Z)(S) A a a B b b C c c X b d Y g a Z a f d d c a a b f f d b d b g a g a f a c b d d a f 75 © D. Wong 2003 2002 Week 4 76 © D. Wong 2003 2002 The rest of Ch. 5 – Relational Algebra Ch. 10 – Logical query languages 77 © D. Wong 2003 2002 Combining Operations to Form Queries Construct more complex expressions by applying operators to sub-expressions Use parentheses to indicate operands grouping Multiple ways to write equivalent queries Expression tree for visualizing complex expression Query optimizer Example (ref. Fig. 5.8): title,year(slength≥100(Movie) sstudioName=‘Fox’(Movie)) or title,year(slength≥100 AND studioName=‘Fox’(Movie)) 78 © D. Wong 2003 2002 Algebraic Laws R (S T) = (R S) T Associative: e.g. Commutative: e.g. R S = S R 79 © D. Wong 2003 2002 Dependent and Independent Operations 1. Set operations: union, difference, intersection 2. Remove parts of a relation: – Selection – eliminates rows (tuples) – Projection – eliminates columns 3. Combine tuples of two relations : cartesian product and joins 4. Renaming: changes relation schema (i.e. relation name, and/or attribute names) 80 © D. Wong 2003 2002 Constraints on Relations Two ways to express constraints in relational algebra: 1. R=Ø // R Ø 2. RS // R - S = Ø where R and S are relational algebra expressions 81 © D. Wong 2003 2002 Referential Integrity Constraints Assertion: a value appearing in one context also appears in another, related context. Example: Movie(title, year, length, inColor, studioName, producerC#) MovieExec(name, address, cert#, networth) Constraint: the producer of every movie is a certified movie executive, i.e. appear in the MovieExec relation producerC#(Movie) cert#(MovieExec) or producerC#(Movie) - cert#(MovieExec) = Ø 82 © D. Wong 2003 2002 Other Constraints Domain constraints example: MovieStars(name, address, gender, birthdate) Constraint: acceptable values for the “gender” attribute are ‘F’ or ‘M’ sgender’F’ AND gender‘M’(MovieStar) = Ø Other constraints example: MovieExec(name, address, cert#, networth) Studio(name, address, presC#) Constraint: president of a movie studio must have a net worth of at least $10,000,000 snetworth<10000000(Studio presC#=cert# MovieExec) = Ø Functional dependency constraints 83 © D. Wong 2003 2002 Relational Operations on Bags Bag – a “set” that is allowed to have more than one occurrence of an element – => duplicate tuples in a relation – Constraint representations work with bags Reason: – For implementation efficiency when duplication is acceptable – When actual no. of tuples is needed for aggregate Example: A B 1 2 3 4 1 2 1 2 84 © D. Wong 2003 2002 Relational Operations of Bags (continue) Given: R and S are bags, and tuple t appears in R n times, and in S m times R S : contains n + m tuple t R – S : contains max(0, n-m) tuple t R S : contains min(n, m) tuple t A,B(R) : each tuple is processed independently, resulting duplicate tuples are not eliminated sc(R) : apply the selection condition to each tuple independently, resulting duplicate tuples are not eliminated 85 © D. Wong 2003 2002 Product and Joins of Bags Given: R and S are bags, and tuple r appears in R m times, and tuple s appears in S n times R x S : the resulting tuple rs will appear mn times. R S : each tuple of R is compared to each tuple of S to decide if the pair tuples joins successfully, do not eliminate duplicates R CS : each tuple of R is compared to each tuple of S to decide if the condition C is met, do not eliminate duplicates 86 © D. Wong 2003 2002 Extended Operation to Relational Algebra Duplicate elimination : to convert a bag to a set Aggregation: count, sum, max, min, average Grouping Extended Projection Sorting Outerjoins 87 © D. Wong 2003 2002 OUTERJOINS Dangling tuples: tuples that failed to match any tuple of the other relation in the common attributes. An operator to augment the result of a join by the dangling tuples, padded with null values. R S : Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with special null symbols . LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with . RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with . The joining may be NATURAL or theta join 88 © D. Wong 2003 2002 Outer Join Example R A B C a b d S B C D c b c d b c b c e b b f a d b c a d c d b Full Outer Join Natural join A B C D A B C D a b c d a b c d a b c e a b c e d b c d d b c d d b c e d b c e b b f c a d b c a d b c d b 89 © D. Wong 2003 2002 Extensions to the Relational Model Modifications : insert, delete, update Views : relational expression with a name to be applied real relations to produce the relation defined by the expression. Views can used as arguments to other expressions. Null values : common interpretations: – Value unknown – Value inapplicable – Value withheld 90 © D. Wong 2003 2002 Ch. 10 Logical Query Languages Motivation Datalog Relational Algebra to Datalog Recursion in Datalog Negation in Recursive Rules 91 © D. Wong 2003 2002 Motivation Logical rules is more natural in representing recursive queries Logical rules form the basis of many informationintegration applications 92 © D. Wong 2003 2002 A Datalog rule example Relation: Movie(title, year, length, inColor, studioName, producerC#) LongMovie(t, y) Movie(t, y, l, c, s, p) AND l 100 head subgoals body LongMovie = title, year(slength≥100(Movie)) 93 © D. Wong 2003 2002 Datalog rule Relational Atoms : predicate followed by arguments Arithmetic Atoms : comparison between two arithmetic expressions (e.g. x ≠ Y) Predicate = relation name or arithmetic comparison predicates (e.g. =, <, ≠, etc) Head – a relational atom Body – one or more atoms (subgoals) connected by AND Subgoals (not head) may be optionally negated by NOT Local variables – variables in body, not in head 94 © D. Wong 2003 2002 Datalog A logic based data model The underlying mathematical model of data is essentially that of the relational model Predicate symbols denote relations Relational algebra operations are described by rules Query : a collection of one or more rules. The relation in the rule head is the answer to the query 95 © D. Wong 2003 2002 Extensional and Intensional Predicates Extensional Predicates (EDB) The set of relations which ARE defined as part of the actual database (i.e. physically stored). e.g. R = {1} Intensional Predicates (IDB) The set of relations which are NOT defined as part of the actual database but are instead abstracted from logical rules. e.g. P (x) Q (x) Q (x) R (x) A predicate must be IDB or EDB but not both. –IDB predicate can appear in the body or head of a rule –EDB predicate can appear in the rule body only 96 © D. Wong 2003 2002 3 Different Interpretations of Logical Rules Proof-Theoretic Interpretaton Model-Theoretic Interpretation Computational Interpretation The following discussions will use the EDB: R = {1} 97 © D. Wong 2003 2002 Proof-Theoretic Interpretation As axioms to be used in a proof. –From the facts in the database, see what other facts can be proved using the rules in all possible ways. –All facts derivable using the rules are derivable by applying the rules in the forward direction only –Example: P = {1}, Q = {1} 98 © D. Wong 2003 2002 Model-Theoretic Interpretation As definition of possible worlds or models. – To be a model, an interpretation must make the rules true, no matter what assignment of values is made for the variables in each rule. – Multiple models are possible – With no negations, a unique minimal model exists that gives the same result as the proof-theoretic interpretation. – Minimal model : cannot make any true fact false and still have a model consistent with the EDB – Example: 1. P = {1, 2, 3}, Q = {1, 2} 2. P = {1}, Q={1} 99 © D. Wong 2003 2002 Computational Interpretation By providing an algorithm for “executing” the rules to determine whether a potential fact is true or false. – E.g. Prolog – uses a particular algorithm that involves searching for proofs of the potential fact. – Drawback: the set of facts Prolog finds a proof is not necessarily the same as the set of all facts for which a proof exists The set of facts Prolog finds true is not necessarily a model. 100 © D. Wong 2003 2002 Week 5 101 © D. Wong 2003 2002 Example 1. 2. 3. sibling(X, Y) parent(X, Z) AND parent(Y, Z) AND X ≠ Y. cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND sibling(Xp, Yp). cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND cousin((Xp, Yp). 4. related(X, Y) sibling(X, Y). 5. related(X, Y) related(X, Z) AND parent(Y, Z). 6. related(X, Y) related(Z, Y) AND parent(X, Z). 102 © D. Wong 2003 2002 Meaning of Logical Rules Head is true of its arguments if there exist values for local variables that make all the subgoals true If rule contains only non-negated relational predicates, then value of head variables is the result of natural join the subgoals and project onto the head variables Example: Cousin(X, Y) = X,Y(P(X, Xp) Yp)) 103 P(Y, Yp) S(Xp, © D. Wong 2003 2002 Evaluation of Rules – Variable based Consider all possible assignments of values to the variables. If all subgoals are true, add the head to the result relation. Example: – Only assignments that make the first subgoal true: Case 1: x = 1, z = 2 Case 2: x = 2, z = 3 – In case 1, y = 3 makes second subgoal true. Since (1, 3) is not in R, so the third subgoal is also true. So, add (x, y) = (1, 3) to S – In case 2, no value of y makes the second subgoal true – So, S = {(1, 3)} 104 © D. Wong 2003 2002 Evaluation of Rules – Tuple based Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result. Start with non-negated relational subgoals only Example: – Start with R(x,z) and R(z, y). Possible tuple assignments to subgoals: R(x, z) R(z, y) – Only 1 assignment gives consistent value to z (1, 2) (1, 2) (1, 2) (2, 3) – It also makes the third subgoal true – So, S = {(1, 3)} (2, 3) (1, 2) (2, 3) (2, 3) 105 © D. Wong 2003 2002 Safe rules A rule is safe if all of its variables are limited. Variable, X, in a rule is limited if: 1. Any variable that appears as an argument in a relational predicate of the body is limited 2. Any variable that appears in a subgoal X=a, or a=X, where a is a constant, is limited 3.Variable x is limited if it appears in a subgoal X=Y or Y=X, when Y is a variable already known to be limited Rules must be safe to make sense (i.e. to produce finite relation as the result) 106 © D. Wong 2003 2002 Safe and unsafe rule Examples 1. biggerThan(X, Y) X > Y. unsafe 2. S(X) R(Y). unsafe 3. S(X) R(Y) AND X < Y unsafe 4. P(X, Y) Q(X, Z) & W=a & Y=W. safe 107 © D. Wong 2003 2002 Datalog Rules and Relational Algebra summary Intersection : AND Union: multiple rules with same IDB predicate as rule head Difference: NOT Projection: rule head atom has arguments of the projected attributes Selection: condition represented by arithmetic subgoals Product: AND two subgoals, with head atom contain all attributes of the two subgoals Joins 108 © D. Wong 2003 2002 A Datalog rule example Relation: parent(C,P), where P is a parent of C sibling(X, Y) parent(X, Z) AND parent(Y, Z) AND X ≠ Y. head subgoals body S(X,Y) = X,Y(sX≠Y(P(X, Z) 109 P(Y, Z))) © D. Wong 2003 2002 Recursion P Q P depends on Q Draw a graph: node = IDB predicates, arc P -> Q means P depends on Q Cycles means recursion Example: cousin(X, Y) parent (X, Xp) AND parent (Y, Yp) AND cousin((Xp, Yp). 110 © D. Wong 2003 2002 Dependency Graph related cousin sibling parent 111 © D. Wong 2003 2002 Least Fixedpoint The smallest IDB relations that contain all tuples that the rules require us to follow Iterative Fixed-point Evaluation of recursive rules: Start IDB = yes Apply rules to IDB, EDB Changes to IDB? 112 No Done © D. Wong 2003 2002 Fixedpoint Computation Example (Ref. Text) 1. FollowOn(x, y) SequelOf(x, y) 2. FollowOn(x, y) SequelOf(x, z) AND FollowOn(z, y) Evaluation note: At round i, the only new tuples added to any IDB relation comes from match to tuples added in the round i-1 113 © D. Wong 2003 2002 Negation in Recursive Rules We can compute a minimal fixedpoint only for stratifiable recursion with negation. To determine stratifiability: 1. Draw a graph whose nodes are the IDB predicates 2. Draw an arc (labeled -) from node A to node B for rule A not B 3. Draw an arc (no label) from node A to node B for rule A B 114 © D. Wong 2003 2002 Negation in Recursive Rules (continued) If graph has a cycle containing one or more negative arcs, the the recursion is NOT stratified. (When negation appears in the least fixedpoint operation.) Stratum of IDB predicate A = the largest no. of negative arcs on a path beginning from A. If no negative arcs on any paths from predicate A, stratum number of A = 0 Evaluate IDB predicates in the order of their strata, starting with the lowest. 115 © D. Wong 2003 2002 Week 6 116 © D. Wong 2003 2002 Ch. 3 (continued) Database design problems Functional Dependency Keys of relations Decompositions based on Functional Dependency (normalization) – Boyce-Codd Normal Form (BCNF) – Third Normal Form – Recovering information from a decomposition Multivalued Dependencies and decomposition – Fourth Normal Form Relationships Among Normal Forms 117 © D. Wong 2003 2002 Database Design Problems Principal kinds of anomalies that constitute bad db design: – Redundancy – Update Anomalies – Deletion Anomalies – Insertion Anomalies Want to find a good relation schema design for the relational model 118 © D. Wong 2003 2002 Redundancy Information repeated unnecessarily in several tuples. Example: – length and filmType for Movie 119 © D. Wong 2003 2002 Update Anomalies Change information in one tuple but leave the same information unchanged in another A consequence of redundancy Cause potential inconsistency Example: – Change length of Star Wars in one tuple but not the others 120 © D. Wong 2003 2002 Deletion Anomalies A set of values becomes empty (deleted), may lose other information as side effect Example: – Deleting the only star listed for the movie Mighty Ducks 121 © D. Wong 2003 2002 Insertion Anomalies Cannot insert a tuple because some of the data not yet available Inverse to deletion anomalies Problem of using null value to fill the missing / unavailable data: – When the data becomes available, will we remember to delete the one with nulls – If the missing data is part of a key, then can’t use null Example: – Cannot start to keep track of the information of a new movie when the cast is not yet determined 122 © D. Wong 2003 2002 Dependencies Dependency is an assertion that only a subset of all possible relations are ‘legal’. An assertion about the real world, cannot be proved. It’s a form of constraints. Dependencies in a relation means some sort of redundancy in the legal relations. Example: title, year length title year length filmType studioName starName Star wars 1977 124 Color Fox CarrieFisher Star wars 1977 ??? Color Fox Mark Hamil Functional dependencies (FD) Multivalued dependencies (MVD) 123 © D. Wong 2003 2002 Functional Dependencies (FD) Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X Y means X functionally determines Y e.g. A1A2…An B1B2…Bm X Y is an assertion about R that whenever two tuples agree on all the attributes of X, then they must also agree on attributes of Y. It’s a constraint on the data that may appear within a relation. (i.e. schema level control of data) It’s a restriction on relations that depend only on the equality or inequality of values, i.e. value-oblivious It’s the most important type of value-oblivious constraints. Value-oblivious constraints have the greatest impact for designing database schemas 124 © D. Wong 2003 2002 Significance of Functional Dependencies Key: 1. Entity set R (A1, …, An), and X is a subset of A1, …, An that form a key for R, then may assert: X Y where Y is any subset of {A1, …, An} 2. R(A1, …, An) is a many-one relationship from entity set E1 to entity set E2, and among the Ai’s are attributes that form a key X for E1 and a key Y for E2, then may assert: X Y Tool for explaining the process of normalization 125 © D. Wong 2003 2002 Representing FD using Relational Algebra Example 5.31 MovieStar (name, address, gender, birthdate) FD: name address sMS1.name=MS2.name AND MS1.address MS2.address (MS1(MovieStar) x MS2(MovieStar)) = 126 © D. Wong 2003 2002 Keys in the E/R Model key for an entity set E is a set K of one or more attributes such that, given any two distinct entities e1 and e2 in E, e1 and e2 cannot have identical values for each of the attributes in the key K. – A key can consist of more than one attribute – There can be more than one possible key (candidate keys), but designate one as the “primary key”. – Attributes that constitute the primary key cannot be null – If the entity set is in an isa-hierarchy, require the root entity set have all the attributes needed for a key. – In E/R diagram, attributes of a key are underlined 127 © D. Wong 2003 2002 Keys of Relations K is a key for relation R if: 1. K all attributes of R 2. For no proper subset of K is (1) true (i.e. a key must be minimal) 3. If K at least satisfies (1), then K is a superkey Example: {title, year, starName) forms a key for the Movie relation Superkeys: A set of attributes that contains a key is called a superkey 128 © D. Wong 2003 2002 Discovering Keys for relations Consider relation R: 1. If R comes from an entity set, then the key for the relation is the key attributes of this entity set 2. If R is from a many-many relationship, then the keys of both connected entity sets are the key attributes for R 3. If R is from a many-one relationship from entity set E1 to entity set E2, then the keys attributes of E1 are the key attributes for R 4. If R is from a one-one relationship, then the key attributes for either of the connected entity sets are key attributes of R. More than 1 candidate key in this case. 129 © D. Wong 2003 2002 FD Rules Splitting / Combining rule Trivial Dependencies Armstrong’s Axioms: 1. Reflexivity 2. Augmentatioin 3. Transitivity Computing closure of attributes Finding all implied FD’s ( Ref. section: 3.5.7) 130 © D. Wong 2003 2002 Splitting / Combining rule FD: A1A2…An B1B2…Bm I vs. A1A2…An B1 II ... A1A2…An Bm Splitting rule: replace FD I by the set of FDs II Combining rule: replace the set of FDs II by FD I 131 © D. Wong 2003 2002 Trivial Dependencies FD A1A2…An B is trivial if B is one of the A’s Every trivial dependency holds in every relation For FD A1A2…An B1B2…Bm – Trivial if the B’s are a subset of the A’s e.g. title year title – Nontrivial if at least one of the B’s is not among the A’s – Completely nontrivial if none of the B’s is also one of the A’s Trivial-dependency rule: – A1A2…An B1B2…Bm A1A2…An C1C2…Ck where the C’s are all those B’s that are not also A’s 132 © D. Wong 2003 2002 Armstrong’s Axioms: A set of inference rules: (Pg. 135, 2nd ed. Pg.99) 1. Reflexivity If {B1, B2, …, Bm} {A1, A2, …, An}, then A1A2…An B1B2…Bm. // trivial dependencies 2. Augmentatioin If A1A2…An B1B2…Bm, then A1A2…AnC1…Ck B1B2…BmC1…Ck for any set of attributes C1…Ck 3. Transitivity If A1A2…An B1B2…Bm and B1B2…Bm C1…Ck then A1A2…An C1…Ck 133 © D. Wong 2003 2002 Week 7 134 © D. Wong 2003 2002 Functional Dependencies (FD) Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X Y means X functionally determines Y e.g. A1A2…An B1B2…Bm Significance: – Key: 1. Entity set R (A1, …, An), and X is a subset of A1, …, An that form a key for R, then may assert: X Y where Y is any subset of {A1, …, An} 2. R(A1, …, An) is a many-one relationship from entity set E1 to entity set E2, and among the Ai’s are attributes that form a key X for E1 and a key Y for E2, then may assert: X Y 135 © D. Wong 2003 2002 Keys of Relations K is a key for relation R if: 1. K all attributes of R 2. For no proper subset of K is (1) true (i.e. a key must be minimal) 3. If K at least satisfies (1), then K is a superkey Example: {title, year, starName) forms a key for the Movie relation Superkeys: A set of attributes that contains a key is called a superkey 136 © D. Wong 2003 2002 FD Rules Splitting / Combining rule Trivial Dependencies Armstrong’s Axioms: 1. Reflexivity 2. Augmentatioin 3. Transitivity Computing closure of attributes Finding all implied FD’s ( Ref. section: 3.5.7) 137 © D. Wong 2003 2002 Given vs. Derived FD’s Given FD’s: stated initially for a relation. FD’s These are known to hold for a relation R. Derived FD’s are FD’s that follow logically from the given FD’s (using the inference rules) or by the closure algorithm of attributes. Basis : any set of given FD’s from which all FD’s for a relation can be inferred. Minimal basis: if no proper subset of the FD’s in a basis can also derive the complete set of FD’s. Example: 3.22 – pp. 98 138 © D. Wong 2003 2002 Closure of attributes Let {A1,A2,…,An} is a set of attributes, S is a set of FD’s Closure of {A1,A2,…,An} under the dependencies in S = set of attributes B such that every relation that satisfies all the dependencies in set S also satisfies A1A2…An B. I.e. B is the set of attributes functionally determined by {A1,A2,…,An} Notation: {A1,A2,…,An}+ Allow trivial dependencies, so A1A2…An are always in {A1,A2,…,An}+ 139 © D. Wong 2003 2002 Closure Algorithm Let X be {A1,A2,…,An}+ 1. X = {A1,A2,…,An} // initialize X 2. Search for some FD B1B2…Bm C where B1B2…Bm X, but C is not in X. If found, add C to X 3. Repeat step 2 until no more attributes can be added to X. 4. The resulting X is the correct value of {A1,A2,…,An}+ 140 © D. Wong 2003 2002 Attributes Closure Example. 3.19 pp. 93 Given: R(A, B, C, D, E, F) and FD’s AB C BC AD DE CF B To find: {A, B}+ Procedure: 1. Let X = {A, B} 2. Add C to X 3. Add D to X // AB C, X = {A, B, C} // BC AD, X = {A, B, C, D} 4. Add E to X // D E, X = {A, B, C, D, E} 5. End, {A, B}+ = {A, B, C, D, E} 141 © D. Wong 2003 2002 To infer if a FD follows a set of FD in a relation AB D follows from dependencies in example 3.19? – Compute {A, B}+ , if D ends up in the closure, then AB D follows from the dependencies – Since {A, B}+ = {A, B, C, D, E}, AB D follows. D A follows from dependencies in example 3.19? – Compute {D}+ , if A ends up in the closure, then D A follows from the dependencies – But {D}+ = {D, E}, D A does NOT follows. 142 © D. Wong 2003 2002 Attributes Closures and Keys To test if A1,A2,…,An is a key for a relation R: Compute {A1,A2,…,An}+ 1. Check if {A1,A2,…,An}+ is the set of all attributes of R 2. If yes, then check no subset of A1,A2,…,An, say S, such that S+ is the set of all attributes of R If A1,A2,…,An only satisfies 1 but not 2, then A1,A2,…,An is a superkey 143 © D. Wong 2003 2002 Finding all implied FD’s Motivation: Given relation R with FDs F. When projecting R to form new relation S, want to know what FD’s hold in S Method: compute all FD’s that 1. Follow from F 2. Involve only attributes of S Problem: no. of FDs may be large (could be exponential in the number of attributes of S) So, make simplifications in computing attribute closure: – Ignore empty set and set of all attributes (trivial FD’s) – – Drop XY A if X A holds. If closure of some set X is all attributes, then ignore supersets of X. 144 © D. Wong 2003 2002 Algorithm to find all implied FD’s 1. 2. 3. For each set of attributes X compute X+. Start with closures of singleton set, and then move onto doubleton if necessary. Add X A for each A in X+ - X Ignore “obvious” dependencies that follow from others as described in the simplification guidelines. 145 © D. Wong 2003 2002 Finding all implied FD’s Example Given: R(ABCD), and FD’s F: AB C, C D, D A What FD’s follow from F? • A+ = A; B+ = B // nothing • C+ = ACD // add C A • D+ = DA // nothing new • (AB)+ = ABCD // add AB D; skip all supersets of AB • (BC)+ = ABCD // nothing new; skip all supersets of BC • (BD)+ = ABCD // add BD C; skip all supersets of BD • (AC)+ = ACD ; (AD)+ = (AD); (CD)+ = ACD // nothing new • (ACD)+ = ACD • All other sets contain AB, BC, or BD, so skip • the only interesting FD’s that follow from F are : // nothing new; C A, AB D, BD C 146 © D. Wong 2003 2002 Normalization Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies. Normal forms: – Boyce-Codd Normal Form (BCNF) – Third Normal Form (3NF) – Fourth Normal Form (4NF) 147 © D. Wong 2003 2002 Boyce-Codd Normal Form (BCNF) Definition: A relation R is in BCNF iff whenever there is a nontrivial dependency A1A2…An B holds for R, {A1,A2,…,An} must be a superkey for R Alternative definition after applying the combining rule to all FD’s with a common L.S.: A relation R is in BCNF iff whenever nontrivial dependency A1A2…An B1B2…Bm holds for R, {A1,A2,…,An} must be a superkey for R Guarantees no redundancy, prevents update, deletion, and insertion anomalies E.g. title year length filmType studioName // violates BCNF because key of Movie is {title, year, starName}, so Movie is not in BCNF 148 © D. Wong 2003 2002 Decomposition to BCNF Given: relation R, FD’s F. Decomposition strategy: 1. look for a non-trivial FD A1A2…An B1B2…Bm that violates BCNF. (Heuristic: add to the RS as many attributes as are functionally determined by A1A2…An) 2. Compute {A1,A2,…,An}+ // all the attributes involved in the violating dependency Others 3. Decompose R into 1. 2. A’s {A1,A2,…,An}+ , and (R - {A1,A2,…,An}+) {A1,A2,…,An} B’s // LS + all attributes not involved in the violating dependency 149 © D. Wong 2003 2002 BCNF Decomposition Example 3.24 pp 104 Relation: Movie(title, year, length, filmType, studioName, starName) Key: {title, year, starName} FD’s: title year length filmType studioName is a BCNF violation, so Movie not in BCNF Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName} To obtain the new relations, project the schemas onto Movie To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information. 150 © D. Wong 2003 2002 Additional points about BCNF A relation may have more than 1 keys Need some key in the LHS of any nontrivial FD Only nontrivial FD’s are potential BCNF violation candidates Any 2-attributes relations is in BCNF Decomposition must be based on FD that holds in the relation, otherwise can’t recover the original relation from the new relations. Eliminates all redundancies Some decomposition may not preserve the FD’s (e.g. 3.32 pp 114) 151 © D. Wong 2003 2002 Third Normal Form (3NF) Definition: A relation R is in 3NF if: whenever there is a nontrivial dependency A1A2…An B holds for R, either {A1,A2,…,An} is a superkey for R, or B is a member of some key (i.e. B is prime). BCNF with a relaxed condition Some redundancy might be left in 3NF if resulting relations are not in BCNF Preserves all the FD’s Process similar to BCNF with the added condition To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information. 152 © D. Wong 2003 2002 Week 9 153 © D. Wong 2003 2002 Normalization Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies. Decompose relation schema into Normal forms: – Boyce-Codd Normal Form (BCNF) – Third Normal Form (3NF) – Fourth Normal Form (4NF) To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie) To recover information (I.e. Movie) from the new relations: natural join the new relations. 154 © D. Wong 2003 2002 BCNF Decomposition Example 3.24 pp 104 Relation: Movie(title, year, length, filmType, studioName, starName) Key: {title, year, starName} FD’s: title year length filmType studioName is a BCNF violation, so Movie not in BCNF Decomposition: Schema 1: {title, year, length, filmType, studioName} Schema 2: {title, year, starName} To obtain the new relations, project the schemas onto Movie To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information. 155 © D. Wong 2003 2002 Functional Dependencies (FD) Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X Y means X functionally determines Y e.g. A1A2…An B1B2…Bm A1A2…An B1B2…Bm is an assertion about R that two attributes or sets of attributes in R are dependent of one another. 156 © D. Wong 2003 2002 Mutivalued Dependencies (MVD) relation schema R, and A1A2…An and B1B2…Bm be subsets of attributes of R. Given: MVD : A1A2…An B1B2…Bm holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: 1. With both t and u on the A’s, 2. With t on the B’s, and 3. With u on all attributes of R that are not among the A’s or B’s A1A2…An B1B2…Bm is an assertion about R that two attributes or sets of attributes in R are independent of one another. Cause redundancy not related to FD’s in a BCNF schema. Most common source: putting 2 or more many-many relationships in a single relation. 157 © D. Wong 2003 2002 MVD Rules Trivial dependencies rule If A1A2…An B1B2…Bm holds for R, then A1A2…An C1C2…Ck holds where the C’s are the B’s + one or more of the A’s. The converse also hold. Transitive rule If A1A2…An B1B2…Bm and B1B2…Bm C1C2…Ck then A1A2…An C1C2…Ck Splitting rule does not hold E.g. name street city, but not name street So, always start with set of attributes on the R.S. because splitting rule does not hold. 158 © D. Wong 2003 2002 More MVD Rules Every FD is an MVD Because If FD A1A2…An B1B2…Bm, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. A’s B’s t u Complementation rule If X Y, then X Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name street city name title year 159 © D. Wong 2003 2002 Nontrivial MVD A1A2…An B1B2…Bm for a relation R is nontrivial if: 1. B1B2…Bm is not a subset of A1A2…An 2. A1A2…An B1B2…Bm is not all attributes of R 160 © D. Wong 2003 2002 Fourth Normal Form (4NF) Decompose relations that has MVD’s into 4NF to eliminate MVD’s. Definition: R is in 4NF if A1A2…An B1B2…Bm is a nontrivial MVD, {A1A2…An} is a superkey. Since every FD is an MVD, so 4NF is more stringent than BCNF Only nontrivial MVD’s has the potential to violate 4NF 161 © D. Wong 2003 2002 4NF Decomposition Given: relation R, and nontrivial MVD X Y that violate 4NF 1. Decompose X Y into XY and X (R-Y) R X Y 2. Produce the relations by projecting R onto XY and (R-Y) X 3. Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name street city Decompose Star_Star_In using name street city into {name, street, city} and {name, title, year} 162 © D. Wong 2003 2002 Relationships among normal forms 4NF is the most stringent 4NF BCNF 3NF 163 © D. Wong 2003 2002 Lossless-join decomposition Given: Relation R, decomposed into schemes R1, R2, … Rk, and D is a set of dependencies. Definition: R1, R2, … Rk is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r = R1(r) R2(r) … Rk(r) i.e. Every relation r for R is the natural join of its projections onto the Ri’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose! 164 © D. Wong 2003 2002 Structured Query Language (SQL) A DDL and DML for relational DBMSs History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3) SQL-99 extends SQL2 with object-relational features and other new features Most DBMS vendors implements the core, and then add bells and whistles and variations Query capability is close to relational algebra, with lots of extensions. Case insensitive except characters inside quoted strings ' ' e.g. 'Smith' 'SMITH' ; as statement delimiter 165 © D. Wong 2003 2002 Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) 166 © D. Wong 2003 2002 SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression; 167 © D. Wong 2003 2002 SQL query examples 1. Example 1: SELECT * FROM Movie; 2. -- * => all attributes of Movie Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; 3. Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990; 168 © D. Wong 2003 2002 Duplicates SQL generally operates using bags instead of sets Exception: UNION, INTERSECT, EXCEPT operation To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; Duplicate elimination is costly. Use judiciously. 169 © D. Wong 2003 2002 SQL Correspondence to Relational Algebra SELECT L -- R.A. project FROM R -- R.A. operands WHERE C ; -- R.A. select R.A. expression: L(sC(R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE -- what's the tuples selection criteria 3. SELECT -- what columns to output 170 © D. Wong 2003 2002 Union, Intersection, Difference of Queries UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) EXCEPT: R1 EXCEPT R2 -- difference (Q1) EXCEPT (Q2) 171 © D. Wong 2003 2002 Union, Intersection, Difference of Queries (continued) Q1 and Q2 are queries that produce relations R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary. Duplicates are eliminated automatically Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination 172 © D. Wong 2003 2002 SQL and Relational Algebra The six independent operations are implemented by SQL SQL is relational complete 173 © D. Wong 2003 2002 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown 174 © D. Wong 2003 2002 1. Strings Comparison operators (according to lexicographical order) <, >, <=, >= = LIKE -- pattern matching % -- matches any sequence of 0 or more characters _ -- matches any one character E.g.: title LIKE 'Star _ _ _ _' E.g.: title LIKE '%''s%' Can specify escape character E.g. title LIKE 'x%%x%' ESCAPE 'x' 175 © D. Wong 2003 2002 2. Dates and Times Date constant: DATE '2002-10-01' Time constant: TIME '15:00:02.5' Timestamp (combines dates and times): TIMESTAMP '2002-10-01 15:00:02.5‘ (beware of implementation differences!) Comparison operators apply 176 © D. Wong 2003 2002 3. Null Values NULL to represent: 1. Value unknown 2. Value inapplicable 3. Value withheld Operations involving NULL 1. Arithmetic operation: result is NULL 2. Comparison: result is UNKNOWN NULL is not a constant, therefore NULL cannot be used explicitly as an operand. IS NULL and IS NOT NULL checks Read "Pitfalls Regarding Nulls" pp. 250 177 © D. Wong 2003 2002 4. UNKNOWN Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1. AND of 2 truth-value = min. of the 2 values 2. OR of 2 truth-value = max. of the 2 values 3. Negation of v = 1-v Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic 178 © D. Wong 2003 2002 The Six Clauses in SQL Queries 1. SELECT -- required 2. FROM -- required 3. WHERE 4. GROUP BY 5. HAVING 6. ORDER BY Subqueries may appear in the FROM clause and the WHERE clause Comments begins with ‘--’ -- if used, must follows a group by clause 179 © D. Wong 2003 2002 Table level SQL (ref. 6.6, pp. 292) Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, lastName varchar(30) not null, firstName varchar(30) not null, num_of_children int, constraint pk_EMP primary key (empno) ); Drop table – to destroy a base table e.g. drop table EMP; 180 © D. Wong 2003 2002 Tuple Modification Statements (ref. 6.5, pp. 286) Insert – to add a row Syntax: insert into R(A1..An) values (v1…vn) – E.g. insert into emp(empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) – Or insert into emp values (12345, ‘Doe’, ‘John’, 1) Delete – to remove a row Syntax: delete from R where <condition> – E.g. delete from emp where empno = 12345 Update – to modify the contents of a row Syntax: update R set Ai = value where Aj = targetValue – E.g. update emp set num_of_children = 2 where empno = 12345 181 © D. Wong 2003 2002 Some JOINS in SQL. (ref. pp. 270) CROSS JOIN -- R.A. cartesian product e.g. Movie CROSS JOIN StarsIn; JOIN … ON -- R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; [NATURAL] JOIN -- R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; OUTERJOINS -- joins that include dangling tuples 182 © D. Wong 2003 2002 OUTERJOINS An operator to augment the result of a join by the dangling tuples, padded with null values. Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs. LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs. RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs. The joining may be NATURAL or theta join 183 © D. Wong 2003 2002 Outerjoins Syntax 1. R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec; 184 © D. Wong 2003 2002 Outerjoins Syntax (continued) 1. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; 185 © D. Wong 2003 2002 Use result of joins as subqueries in queries E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; 186 © D. Wong 2003 2002 Week 10 187 © D. Wong 2003 2002 SQL (continued) SQL continued Oracle Sequence Views 188 © D. Wong 2003 2002 SQL Correspondence to Relational Algebra SELECT L -- R.A. project FROM R -- R.A. operands WHERE C ; -- R.A. select R.A. expression: L(sC(R)) When reading and writing queries: 1. FROM -- what relations are involved 2. WHERE -- what's the tuples selection criteria 3. SELECT -- what columns to output 189 © D. Wong 2003 2002 SQL and Relational Algebra The six independent operations are implemented by SQL SQL is relational complete 190 © D. Wong 2003 2002 Some data values in SQL 1. Strings 2. Dates and Times 3. Null values 4. Truth value of Unknown 191 © D. Wong 2003 2002 4. UNKNOWN Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 1. AND of 2 truth-value = min. of the 2 values 2. OR of 2 truth-value = max. of the 2 values 3. Negation of v = 1-v Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic 192 © D. Wong 2003 2002 SQL data types (pp. 292, 6.6.1) Character strings: char(n) -- fixed length varchar(n) -- varying length Integer int or integer Floating-point numbers float, real dec (m, n), or decimal(m, n) -- floating point number with fixed decimal places, where m > n. e.g. dec(7, 2) Dates and times date, time, timestamp, datetime Refer to handout for SQL more data types description. 193 © D. Wong 2003 2002 SQL syntax: the Six Clauses in SQL Queries 1. 2. 3. 4. 5. 6. SELECT FROM WHERE -- required -- required GROUP BY HAVING -- if used, must follows a group by clause, to choose the groups based on some aggregate properties of the group itself ORDER BY -- to sort the output E.g. Select name, sum(length) from MovieExec, Movie Where producerC# = cert# group by name having min(year) < 1930 Order by name desc 194 © D. Wong 2003 2002 SQL syntax (continued) Subqueries may appear in the FROM clause and the WHERE clause Comments begins with ‘--’ Tuple variables : an alias to a Relation, defined in the from clause – E.g. Select Star1.name, Star2.name From MovieStar Star1, MovieStar Star2 Where Star1.address = Star2.address and Star1.name < Star2.name 195 © D. Wong 2003 2002 Interpreting Multi-relation Queries in SQL To define the meaning of select-from-where MovieStar(name, addr, gender, bDate) name addr gender bDate J. Anniston Addr ABC F 8/88/88 H. Ford Addr LMN M 7/7/77 B. Pitt Addr ABC M 8/1/88 SELECT Star1.name, Star2.name FROM MovieStar Star1, MovieStar Star2 WHERE Star1.addr = Star2.addr AND Star1.name < Star2.name; 196 © D. Wong 2003 2002 3 Equivalent ways of interpretataion 1. 2. Nested Loops Parallel assignment – No explicitly created nested loops – – – – Consider all possible assignments of tuples from the relations to the tuple variables Each assignment that produce a true value from the WHERE clause contribute a tuple to the answer Construct the resulting tuples using the attributes of the SELECT clause Similar to the tuple based Datalog rule evaluation, the WHERE clause serves the purpose of the arithmetic subgoals 197 © D. Wong 2003 2002 Evaluation of Rules – Tuple based Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result. Start with non-negated relational subgoals only Example: – Start with R(x,z) and R(z, y). Possible tuple assignments to subgoals: R(x, z) R(z, y) – Only 1 assignment gives consistent value to z (1, 2) (1, 2) (1, 2) (2, 3) – It also makes the third subgoal true – So, S = {(1, 3)} (2, 3) (1, 2) (2, 3) (2, 3) 198 © D. Wong 2003 2002 3. Conversion to R.A FROM : Cartesian product (rename if necessary) WHERE : as the selection criteria SELECT : specify the projection operation attributes 199 © D. Wong 2003 2002 Unintuitive Result of SQL Interpretation Given: unary relations R, S, T To compute: R (ST) Consider query: SELECT R.A FROM R, S, T WHERE R.A = S.A OR R.A = T.A; Consider the case if T is empty. 200 © D. Wong 2003 2002 Some JOINS in SQL. (ref. pp. 270) CROSS JOIN -- R.A. cartesian product e.g. Movie CROSS JOIN StarsIn; JOIN … ON -- R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; [NATURAL] JOIN -- R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; OUTERJOINS -- joins that include dangling tuples 201 © D. Wong 2003 2002 OUTERJOINS An operator to augment the result of a join by the dangling tuples, padded with null values. Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs. LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs. RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs. The joining may be NATURAL or theta join 202 © D. Wong 2003 2002 Outerjoins Syntax 1. R1 [NATURAL] {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec; 203 © D. Wong 2003 2002 Outerjoins Syntax (continued) 2. R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; 204 © D. Wong 2003 2002 Use result of joins as subqueries in queries E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; 205 © D. Wong 2003 2002 Conditions involving Relations 1. 2. 3. 4. EXIST R – true iff R is not empty S IN R – true iff S = one of the values in R S NOT IN R S > ALL R – true iff S > every value in R (> can be the other comparison operators). Operator NOT applies. S > ANY R – true iff S > at least 1 value in R (> can be the other comparison operators). Operator NOT applies. Case 1: S is scalar, R is unary Case 2: S is a tuple, R is a multi-attribute relation (arity S must = arity R) When compare tuples, the first value has highest priority, and subsequent value the next highest. E.g. (1, 7, 8) < (2, 0, 1) Ref: 6.3-6.4 . 206 © D. Wong 2003 2002 Table level SQL (ref. 6.6, pp. 292) Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, lastName varchar(30) not null, firstName varchar(30) not null, num_of_children int, constraint pk_EMP primary key (empno) ); Drop table – to destroy a base table e.g. drop table EMP; 207 © D. Wong 2003 2002 Tuple Modification Statements (ref. 6.5, pp. 286) Insert – to add a row Syntax: insert into R(A1..An) values (v1…vn) – E.g. insert into emp(empno, lastName, firstName, num_of_children) values (12345, ‘Doe’, ‘John’, 1) – Or insert into emp values (12345, ‘Doe’, ‘John’, 1) Read: P. 288 – Timing of Insertion. Delete – to remove a row Syntax: delete from R where <condition> – E.g. delete from emp where empno = 12345 Update – to modify the contents of a row Syntax: update R set Ai = value where Aj = targetValue – E.g. update emp set num_of_children = 2 where empno = 12345 208 © D. Wong 2003 2002 Modifying Relation Schema DROP TABLE – eliminate all data Alternative is to ALTER TABLE: 1. To add a column ALTER TABLE tablename ADD columnName typespec; 2. To delete a column ALTER TABLE tablename DROP columnName; 3. To change an attribute size ALTER TABLE tablename MODIFY (columnName typespec); 209 © D. Wong 2003 2002 Default Value Used when a value is not supplied for a tuple when create or modify. System assume null when none specified except for NOT NULL attributes. Defined with data declarations: e.g. Phone CHAR(16) DEFAULT ‘unlisted’ 210 © D. Wong 2003 2002 Views Relations that are defined by an expression like a query. Views can be queried, and can even be modified in some cases Declaring views in SQL: CREATE VIEW <viewName> AS <view definition>; Use view to present what the user is allowed to see A view exists only within the environment of a session. 211 © D. Wong 2003 2002 View Example Given: base table Movie(title, year, length, inColor, studioName, presidentC#) CREATE VIEW ParamountMovie AS SELECT title, year FROM Movie WHERE StudioName = 'Paramount'; 212 © D. Wong 2003 2002 SQL queries to View 1. 2. 3. Has the same syntax as a stored relation E.g. Select title From ParamountMovie Where year = 1979; When query to view, the tuples are obtained from the base table. Equivalent to the following query without view: Select title From Movie Where studioName = 'Paramount' and ear = 1979; Ref. Pp.308, 6.7.5 (Interpreting queries involving views) 213 © D. Wong 2003 2002 Update views A view update is translated to modify the base table. Only updatable views can be modified. Views that can be updated: – – Simple Created by SELECT , but not SELECT DISTINCT, some attributes from R : 1. The WHERE clause must not involve R in a subquery 2. The select clause must include enough attributes that for every tuple inserted into the view, the other attributes can be filled out with NULL of default value, and have a tuple of the base relation that will yield the inserted tuple of the view. 214 © D. Wong 2003 2002 Update view problem e.g. E.g. INSERT into ParamountMovie VALUES ('Star Trek', 1979); Does not meet criteria 2 because: The base relation Movie would have NULL rather than 'Paramount' as StudioName 215 © D. Wong 2003 2002 Other operations on Views DELETE tuple from a view = delete from base table, with the condition defining the view added to the condition of the delete statement UPDATE view = update the base table, with the condition defining the view added to the condition of the update statement DELETE view: does not affect the base table DROP VIEW <viewName>; 216 © D. Wong 2003 2002 Oracle's Sequences Sequential lists of numbers that are generated automatically by the database. Useful for creating unique surrogate key values for primary key attribute when no desirable attribute exist to use as the primary key Data type of attributes using sequences is NUMBER 217 © D. Wong 2003 2002 Oracle's Sequences (continued 1) To create a sequence (SIMPLE FORM): CREATE SEQUENCE <sequence name> [INCREMENT BY <number>] [START WITH <start value number>]; e.g. CREATE SEQUENCE ClassSequence INCREMENT BY 1 START WITH 1; To access the next sequence value: <sequence name>.NEXTVAL To use <sequence name>.NEXTVAL in an insert statement: INSERT INTO <table name> VALUES (<sequence name>.NEXTVAL, <field2> … ); e.g INSERT INTO class (classkey, classid, description, section, location) VALUES (ClassSequence.NEXTVAL, ?, ?, ?, ?); 218 © D. Wong 2003 2002 Oracle's Sequences (continued 2) To access the current sequence value: <sequence name>.CURRVAL may use in select, insert statements May be used immediately after using the NEXTVAL in the same database session. To view property of a sequence using the "user_sequences" system table: SELECT * FROM user_sequences; To delete a sequence: DROP SEQUENCE <sequence name>; 219 © D. Wong 2003 2002 Week 11 220 © D. Wong 2003 2002 Indexes JDBC JDBC in J2EE (Java 2 Enterprise Edition) 221 © D. Wong 2003 2002 Indexes An index on an attribute A is a data structure to improve query performance efficiency Reason: not efficient to scan all tuples (for large relations) in order to find the few that meet a given condition E.g. SELECT * FROM Movie WHERE studioName = ‘Disney’ AND year = 1990; Not part of SQL standard 222 © D. Wong 2003 2002 Indexes Typical Syntax To create index CREATE INDEX indexName ON R(A1,…An) E.g. 1. CREATE INDEX YearIndex ON Movie(year); 2. CREATE INDEX KeyIndex ON Movie(title, year); Delete Index DELETE INDEX yearIndex; 223 © D. Wong 2003 2002 Selection of Indexes Index design requires an estimate of the typical mix of queries and other operations on the db Example of good use of indexes: 1. An attribute frequently compared to constant in a where clause of a query 2. Attribute that appear frequently in join operations e.g. SELECT name FROM Movie, MovieExec WHERE title = ‘status’ AND producerC# = cert#; 224 © D. Wong 2003 2002 Decision factors Important to strike a balance. Factors: 1. Given attribute A, and index on A will: – – Greatly speed up queries with a condition on that attribute May speed up joins involving A 2. Index make insertion, deletion, and updates more complex and time-consuming 225 © D. Wong 2003 2002 Indexes (continued) Techniques to execute SQL queries are intimately associated with storage structures. Typically, a relation is stored in many disk blocks. An index is an auxiliary structure, perhaps stored in a separate file, that support fast access to the rows of a table. Main cost of a query or modification is I/O: No. of disk blocks to be read into memory and write onto disk 226 © D. Wong 2003 2002 Index Selection Example Given Relation: StarsIn(movieTitle, movieYear, starName) 3 operations to perform: 1. Q1: select movieTitle, movieYear From StarsIn where starName = s; 2. Q2: select starName from StarsIn where movieTitle = t and movieYear = y; 3. I: insert into StarsIn values(t, y, s); where s, t, y are some constants 227 © D. Wong 2003 2002 Example's assumptions 1. Cost for examining 1 disk block = 1 unit 2. StarsIn is stored in 10 disk blocks 3. Average no. of stars in a movie = 3 4. Average no. of movies that a star appeared in = 3 5. Tuples for a given star or movie are likely to be spread over the 10 disk blocks of StarsIn 6. One disk access is needed to read a block of the index every time when the index is used to locate tuples with a given value for the indexd attribute(s) 228 © D. Wong 2003 2002 Example's Estimated Cost of actions Action No Index Star Index Movie Index Both Indexes Q1 10 4 10 4 Q2 10 10 4 4 I 2 4 4 6 Cost of 3 actions 2+8p1+8p2 4+6p2 4+6p1 6-2p1-2p2 Costs associated with the three actions, as a function of which indexes are selected (Ref. Fig. 6.17 2nd ed.) Star index is an index on StarName, Movie index is an index on MovieTitle and movieYear. The numbers in rows 2-5 of the table are no. of disk accesses for the action. 229 © D. Wong 2003 2002 Example's usage scenarios The fraction of the time to do Q1 = p1, Q2 = p2, I=1-p1-p2 Consider: – Case 1: p1 = p2 = 0.1 – Case 2: p1 = p2 = 0.4 – Case 3: p1=0.5, p2=0.1 What is the best index strategy for each case? Create only the index that helps the most frequently used query type (e.g. query about stars => create Star index) 230 © D. Wong 2003 2002 JDBC (Java DataBase Connectivity) An API to the database driver manager Provides call-level interface for the execution of SQL statements from a Java language program Developed by SUN Microsystems An integral part of the Java language Java applications use the JDBC dialect of SQL, independent of the DBMS used Support applications to request information about the schema from the DBMS at run time. 231 © D. Wong 2003 2002 JDBC Interaction with SQL DB 1. Make connection to the database 2. Create SQL statement 3. Execute the SQL statement 4. Result table from the SQL "select" statement is returned as a Java object. JDBC provide methods to access the rows and columns. OR 5. SQL statements return simple integer results that represent the number of affected rows (e.g. insert) 232 © D. Wong 2003 2002 Connecting to a db through JDBC Driver 1 Application Driver Manager Driver 2 Driver 3 DBMS server JDBC Modules 233 © D. Wong 2003 2002 JDBC - java.sql package Defines a collection of interfaces and classes that allow programs to interact with db. Interfaces for primary SQL execution: – Driver: supports data connection creation – Connection: represents connection between a java client and and SQL database server – Statement: includes methods for executing text queries – PreparedStatement: represents a precompiled and stored query – CallableStatement: used to execute SQL stored procedures – ResultlSet: contains the results of a query – ResultSetMetaData: information about a ResultSet, including the attribute names and types 234 © D. Wong 2003 2002 Client-server Architectures User Interface / Application User Interface / Application User and Application tier Middleware Middle tier Application Database Server Database Server 235 Database Server tier © D. Wong 2003 2002 JDBC in J2EE J2EE – Java 2 Enterprise Edition, a middle layer server Connection to DBMS using JDBC (e.g. Cloudscape, Oracle, MS SQL) J2EE Platform – services and architecture Enterprise JavaBeans (EJB) – Session Beans vs. Entity Beans EJB access to databases using JDBC – Database connection – Persistence management (Entity Bean e.g.) – Transaction management (Session Bean e.g.) 236 © D. Wong 2003 2002 J2EE Services HTTP - enables Web browsers to access servlets and JavaServer PagesTM (JSP) files EJB - allows clients to invoke methods on enterprise beans Authentication - enforces security by requiring users to log in Naming and Directory - allows programs to locate services and components through the Java Naming and Directory InterfaceTM (JNDI) API 237 © D. Wong 2003 2002 J2EE Architecture Ref. JavaTM 2 Enterprise Edition Developer's Guide, Figure 1-2 238 © D. Wong 2003 2002 Enterprise JavaBeans (EJB) Server-side Java components Contain the business logic of enterprise application Support database access Transactional Multi-user secure Managed by the EJB container Prohibited from a set of operations 239 © D. Wong 2003 2002 Session Bean vs. Entity Bean Session Bean Entity Bean Purpose Performs a task for a Represents a business client entity object that exists in persistent storage. Shared Access May have one client. May be shared by multiple clients. Persistence Not persistent. Persistent. Entity state remains in a database. Ref. JavaTM 2 Enterprise Edition Developer's Guide, Table 1-1 240 © D. Wong 2003 2002 EJB Access to Databases Using JDBC API J2EE uses 1. JDBC 2.0 (java.sql) and 2. JDBC 2.0 Optional package (javax.sql) To make a connection to database in J2EE : 1. Should not hardcode the actual name (URL) of the database in EJB 2. Should refer to the database with a logical name 3. Use a JNDI lookup when obtaining the database connection. 241 © D. Wong 2003 2002 Driver and Data source properties In J2EE configuration file, resource.properties, specify: Driver e.g. 1 Cloudscape that is packaged with Sun’s J2EE jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver e.g. 2 Oracle jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver JDBC URL e.g. 1 Cloudscape jdbcDataSource.0.name=jdbc/Cloudscape jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true e.g. 2 Oracle jdbcDataSource.0.name=jdbc/Oracle jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610 242 © D. Wong 2003 2002 Making a connection to database example 1. Specify the logical database name. private String dbName = "java:comp/env/jdbc/AccountDB"; 2. Obtain the DataSource associated with the logical name. InitialContext ic = new InitialContext(); DataSource ds = (DataSource) ic.lookup(dbName); 3. Get the Connection from the DataSource. Connection con = ds.getConnection(username, password); 243 © D. Wong 2003 2002 Specifying JNDI name for deployment Step 1: Enter the code name 244 © D. Wong 2003 2002 Step 2: Map the coded name to the JNDI name 245 © D. Wong 2003 2002 Persistence Management Container-Managed Persistence • Entity bean code does not contain database access calls. • The EJB container generates the SQL statements. Bean-Managed Persistence • Entity bean code contains the database access calls (SQLs) (i.e. you write the code!) 246 © D. Wong 2003 2002 Container Managed example: Product entity bean ProductEJB.java ProductHome.java Product.java ProductClient.java Bean Managed example: Account entity bean AccountEJB.java AccountHome.java Account.java AccountClient.java 247 © D. Wong 2003 2002 248 © D. Wong 2003 2002 249 © D. Wong 2003 2002 250 © D. Wong 2003 2002 Week 12 251 © D. Wong 2003 2002 EJB Access to Databases Using JDBC API J2EE uses 1. JDBC 2.0 (java.sql) and 2. JDBC 2.0 Optional package (javax.sql) To make a connection to database in J2EE : 1. Should not hardcode the actual name (URL) of the database in EJB 2. Should refer to the database with a logical name 3. Use a JNDI lookup when obtaining the database connection. 252 © D. Wong 2003 2002 Driver and Data source properties In J2EE configuration file, resource.properties, specify: Driver e.g. 1 Cloudscape that is packaged with Sun’s J2EE jdbcDriver.0.name=COM.cloudscape.core.RmiJdbcDriver e.g. 2 Oracle jdbcDriver.0.name= oracle.jdbc.driver.OracleDriver JDBC URL e.g. 1 Cloudscape jdbcDataSource.0.name=jdbc/Cloudscape jdbcDataSource.0.url=jdbc:cloudscape:rmi:CloudscapeDB;create=true e.g. 2 Oracle jdbcDataSource.0.name=jdbc/Oracle jdbcDataSource.0.url= jdbc:oracle:thin:@bigoh.cis.uab.edu:1521:cs610 253 © D. Wong 2003 2002 Making a connection to database example 1. Specify the logical database name. private String dbName = "java:comp/env/jdbc/AccountDB"; 2. Obtain the DataSource associated with the logical name. InitialContext ic = new InitialContext(); DataSource ds = (DataSource) ic.lookup(dbName); 3. Get the Connection from the DataSource. Connection con = ds.getConnection(username, password); 254 © D. Wong 2003 2002 Specifying JNDI name for deployment Step 1: Enter the code name 255 © D. Wong 2003 2002 Step 2: Map the coded name to the JNDI name 256 © D. Wong 2003 2002 Persistence Management Container-Managed Persistence • Entity bean code does not contain database access calls. • The EJB container generates the SQL statements. Bean-Managed Persistence • Entity bean code contains the database access calls (SQLs) (i.e. you write the code!) 257 © D. Wong 2003 2002 Container Managed example: Product entity bean ProductEJB.java ProductHome.java Product.java ProductClient.java Bean Managed example: Account entity bean AccountEJB.java AccountHome.java Account.java AccountClient.java 258 © D. Wong 2003 2002 259 © D. Wong 2003 2002 260 © D. Wong 2003 2002 261 © D. Wong 2003 2002 Transaction Management Ref. JavaTM 2 Enterprise Edition Developer's Guide 262 © D. Wong 2003 2002 J2EE Resources • JavaTM 2 SDK, Enterprise Edition Technical Documentation JavaTM 2 Enterprise Edition Developer's Guide • http://java.sun.com/j2ee/j2sdkee/ • http://archives.java.sun.com/archives/j2ee-interest.html • Developing Enterprise Applications with the JavaTM 2 Platform Enterprise Edition http://java.sun.com/j2ee/blueprints/ • Major commercial implementations: •WebLogic - Bea •Websphere - IBM 265 © D. Wong 2003 2002 .Net Data Providers Purpose: – connect, read, and execute commands against data sources – other functions, such as the management of input and output parameters, security, transactions, and database server errors. 266 © D. Wong 2003 2002 .NET data provider classes 1. SqlConnection 2. SqlCommand 3. SqlDataReader - can use for read-only applications from a SQL Server data source 4. SqlDataAdapter - acts as a bridge between a remote SQL Server data source and a DataSet class instance inside a Visual Basic .NET solution 267 © D. Wong 2003 2002 Semi-structured Data Model (Ref. 4.6) Blends class and relation for 1. Flexibility, therefore suitable for integration of database. 2. Serve as a document model in notation such as XML 3. Schemaless – the data is “self-describing”, the schema is attached to the data itself, which can vary arbitrarily over time and within a single db. 268 © D. Wong 2003 2002 Semi-structured Data Model Utility An information integration tool to integrate legacy databases Legacy database problem: The problem of integrating multiple databases designed independently, and used over time for many different applications. 269 © D. Wong 2003 2002 Semi-structured Data Representation A db of semi-structured data is a collection of nodes Node types: – Leaf – have associated data of atomic types (e.g. numbers, strings) – Interior – have 1 or more arcs out – Root – an interior node has no arcs entering it, representing the entire db (I.e. the entry point) Every node must be reachable from the root. 270 © D. Wong 2003 2002 Semistructured Data Representation (continued) Arc – has a label to indicate relationship between 2 nodes. Consider arc L connecting nodes M, N, the 2 roles that an arc serves: 1. If M is an object, N is an attribute, then L represents the name of the attribute 2. If M and N are both objects, L is the name of the relationship from N to M The structure is a graph, not necessarily a tree. E.g. Figure 4.19 271 © D. Wong 2003 2002 XML XML : eXensible Markup Language. Tag-based notation for “marking” documents Plain text Formats of tag: 1. <tag> string of data </tag> e.g. <phone>123-456-7890</phone> 2. <tag/> <!-- empty tag--> e.g. <flag/> Two modes: – Well-formed XML – Valid XML Comment format: <!– your comments --> 272 © D. Wong 2003 2002 HTML vs. XML HTML tags For describing the presentation of the data (e.g. <B> abc </B>) by a web browser XML tags For describing the meaning of the data (e.g. <phone> 123-456-7890 </phone>) 273 © D. Wong 2003 2002 Well-formed XML Requirements: Document begin with a declaration (prologue): <?xml version="1.0"? STANDLONE=“yes”?> It has a root. Every opening tag is followed by a matching closing tag, and the elements are properly nested inside each other. E.g. <BODY> … </BODY> Any attribute can occur at most once in a given opening tag, its value must be provided and quoted. E.g Figure 4.21 274 © D. Wong 2003 2002 Well-formed XML (continued) Invent your own tags. Resembles semi-structure data model: – Schemaless – Self-describing – Object like 275 © D. Wong 2003 2002 Valid XML Uses a Document Type Definition (DTD) to specify: – Allowable tags – Grammar for how the tags may be nested. More flexible than a strict-schema model – E.g. allow optional or missing fields More restrictive than a completely schemaless model 276 © D. Wong 2003 2002 XML elements and Entities/Objects (1) XML evolved from a document markup language (SGML) rather than a database language e.g. <Address> John lives on <Street> Main St </Street> house number <number>123</number> In a remote township </Address> Mixed data and text is a hindrance for database 277 © D. Wong 2003 2002 XML elements and Entities/Objects (2) XML elements are ordered whereas the attributes and the tuples in a relation are not E.g. <Address> <Number>123-4567</Number> <Street>Main St. </Street> <Address> <Address> <Street>Main St. </Street> <Number>123-4567</Number> </Address> But, these tuples are equivalent: Number Street 123-4567 Main St. 278 Street Number Main St. 123-4567 © D. Wong 2003 2002 XML Element types Complex types – Elements that contain sub-elements or carry attributes e.g. <message to=“[email protected]" from=“[email protected]" subject="XML"> <text> XML basics … </text> </message> Simple types – elements that contain numbers (and strings, and dates, etc.) but do not contain any subelements – Built-in or author defined Attributes always have simple types. 279 © D. Wong 2003 2002 Document Type Definition (DTD) It’s set of rules for constructing an XML document. – i.e. a grammar that specifies the schema of a legal XML document Specified by XML authors: – As part of the document itself, or – Stored separate from the document. The document refers to the URL where the DTD is stored. An XML document that conforms to it’s DTD is said to be valid. 280 © D. Wong 2003 2002 DTD Structure Basic DTD structure: <!DOCTYPE root-tag [ <!ELEMENT element-name(components)> more elements ]> The name of the DTD must coincide with the tag name of the root element of the document that conforms to that DTD. One ELEMENT statement for each allowed tag, including the root tag For each tag that can have attributes, the ATTLIST statement specifies the allowed attributes and their type. Component of #PCDATA means simple text (i.e. no tags nested within) 281 © D. Wong 2003 2002 DTD Structure (continue) Component occurrence indicators: – * : zero or more – + : one or more – ? : zero or one time – | : either or e.g. (#PCDATA | (STREET CITY) Empty tags: <!ELEMENT elementName, EMPTY> e.g. <!ELEMENT Flag EMPTY> E.g. Figure 4.22 (2n. Ed.) – a DTD for movie stars Figure 4.23 (2n. Ed.) – a document following DTD in Figure 4.22 282 © D. Wong 2003 2002 Using a DTD Two ways: 1. Include the DTD at the beginning of the document 2. In the document’s opening line, refer to the DTD that is stored separately in a file system accessible to the application that is processing the document. e.g. <?XML VERSION = “1.0” STANDALONE = “no”?> <!DOCTYPE Stars SYSTEM “star.dtd”> 283 © D. Wong 2003 2002 Attribute Lists Format: <!ATTLIST ElementName attributeName type [Modifier] ... > 284 © D. Wong 2003 2002 Some attribute types 1. 2. CDATA <!-- simple text --> ID An attribute of type ID must have a unique value through out the document E.g. if attr1 and attr2 are ID type, then <elt1 attr1="123"> and <elt2 attr2="123"> is illegal Imitate key in relational db 3. IDREF An attribute of type IDREF must refer to a valid ID declared in the same document Imitate foreign key 4. IDREFS Refer to a list of valid IDs declared in the document 285 © D. Wong 2003 2002 Some Modifier Types #IMPLIED – The attribute is optional. – Can remain unspecified #REQUIRED – The attribute is mandatory. – E.g. movieId ID #REQUIRED 286 © D. Wong 2003 2002 Example: Use DTD to alleviate the order problem E.g.: <!ELEMENT Person ((Name, Id, Address) | (Name, Address, Id) | (Id, Address, Name) | (Id, Name, Address) | (Address, Id, Name) | (Address, Name, Id))> But, it's awkward! 287 © D. Wong 2003 2002 XML Schema A DDL for XML documents Purpose: to define a class of XML documents Instance document: an XML document that conforms to a particular schema (schema valid). Instances and schemas may exist as: – documents in files – streams of bytes sent between applications – fields in a database record – collections of XML "Infoset "Information Items 288 © D. Wong 2003 2002 Triggers Available in Oracle and SQL99 Event-Condition-Action rules: define an action the db should take when some db-related events occurs 289 © D. Wong 2003 2002 Trigger vs. other constraints 1. Triggers are awakened when certain events, specified by db programmer, occur. E.g. update, insert, delete to a relation; end of a transaction 2. Other constraints immediately prevent the event if the constraint is violated. For trigger, it's condition is tested when the event occurs, if the condition does not hold, then the action associated with the trigger will not happen (I.e. trigger will not be fired) 3. If the trigger condition is true, the action is performed by the DBMS (I.e. trigger fired). So, it's transparent to the user 290 © D. Wong 2003 2002 Trigger Syntax CREATE [OR REPLACE] TRIGGER <triggerName> {BEFORE | AFTER | INSTEADOF} {DELETE | INSERT | UPDATE [of column, …] } [OR {DELETE | INSERT | UPDATE [of column, …] } ON {tableName | viewName} [REFERENCING { OLD [AS] <oldName> , NEW [AS] <newName> …] FOR EACH {ROW | STATEMENT} [WHEN (condition) ] <action>; 291 © D. Wong 2003 2002 Oracle Triggers <action> is PL/SQL block Simplest is : BEGIN <SQL statement> END; In the PL/SQL action block, variables OLD and NEW are preceded by : e.g. :OLD Follow the create trigger statement with a Dot (.) and then RUN to store the definition in the db The action cannot change the relation that triggers the action, nor to relations connected to the triggering relation by a constraint e.g. FK constraint Read 7.4.2 – 7.4.4 292 © D. Wong 2003 2002 SQL3 Triggers <action> can be: 1. a single SQL statement 2. A SQL statements, separated by ; enclosed in a BEGIN <SQL statements> END; 293 © D. Wong 2003 2002 Constraints Summary 1. Primary Key declaration 2. Foreign Key – referential integrity constraint 3. Constraints within relations: Attribute constraints: 1. NOT NULL; 2. CHECK Tuple based CHECKs 4. 5. Schema level constraints – SQL2 assertions (not in Oracle) Triggers – Oracles's and SQL99's 294 © D. Wong 2003 2002 Week 13 295 © D. Wong 2003 2002 Java API for XML Processing (JAXP) For processing XML data using applications written in Java JAXP APIs - javax.xml.parsers package Leverages 2 parser standards : SAX (Simple API for XML Parsing) parse the data as a stream of events, a serial-access mechanism that does element-by-element processing DOM (Document Object Model) (easier to use) build a tree structure of objects to represent the data org.w3c.dom Detail review of the enrollment example (files posted on the course web page as JAXP example) http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JAXPIntro5.html 296 © D. Wong 2003 2002 Ch. 7 Constraints and Triggers Keys as constraint Foreign key constraints and referential integrity Constraints on attributes Constraints on tuples Assertions Triggers 297 © D. Wong 2003 2002 Keys as Constraint One primary key per relation Declared in CREATE TABLE 2 ways: 1. List with attribute (when only 1 attribute in key) e.g. snum char (5) PRIMARY KEY 2. Add to the list of items declared in the schema. E.g. CREATE TABLE supplier ( snum CHAR(3), sname VARCHAR(15), status INT, city VARCHAR(15) CONSTRAINT pk_supplier PRIMARY KEY(snum) ); 298 © D. Wong 2003 2002 Keys (continued) Primary Key attribute cannot be NULL 2 tuples in R cannot agree on all of the attributes in the primary key. DBMS rejects violating insertions or updates. Can also declare a key with UNIQUE – Can be used instead of PRIMARY KEY – Can have multiple UNIQUE in a table – UNIQUE attributes permit NULL Key constraints are checked at insert and updates Use of index created with key attributes for efficient key enforcement checks 299 © D. Wong 2003 2002 Foreign-Key Constraints To enforce referential integrity constraints E.g. SPJ (type 1), S, P, J (type 2) The reference attribute of the relation of type 2 must be declared PRIMARY or UNIQUE Again 2 ways to declare in CREATE TABLE 1. Inline with declaration 2. Add to the list of items declared in the schema 300 © D. Wong 2003 2002 FK as list of items declared in the schema e.g. CREATE TABLE spj ( snum CHAR(3), pnum CHAR(3), jnum CHAR(3), qty INT, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- this one just use default, i.e. not DEFERRABLE ); 301 © D. Wong 2003 2002 FK – maintaining referential integrity 3 Policies: 1. Default : Reject violating Modifications 2. Cascade : changes to the referenced attributes are mimicked at the foreign key 3. Set Null : set the foreign key to null when the referenced attributes are changed (delete / update) Declare for delete and update separately Declare with the declaration of foreign key Dangling tuples – violate referential integrity constraint for the foreign key. Dealt with the 3 policies. 302 © D. Wong 2003 2002 Maintaining referential integrity - Example CREATE TABLE spj ( snum CHAR(3), pnum CHAR(3), jnum CHAR(3), qty INT, CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) ON DELETE SET NULL DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT fk_spj_3 FOREIGN KEY (jnum) REFERENCES J(jnum) -- just use default ); 303 © D. Wong 2003 2002 Constraints on Attributes and Tuples Limits the values of some attributes. Constraints within relations Defined in relation's schema as: 1. Constraints on an attribute 2. Constraints on tuples 304 © D. Wong 2003 2002 Constraints on the attributes Not-null constraints e.g. pnum CHAR(3) NOT NULL, CONSTRAINT fk_spj_2 FOREIGN KEY (pnum) REFERENCES P(pnum) , Attribute-Based CHECK – Checked whenever any tuple gets a new value for this attribute (e.g. update, insert) – Reject violating operations – The check is expressed as a conditional expression e.g. qty INT CHECK (qty <= 10000), – Invisible to other relations (so, can be violated if the relations referenced in the condition are changed) – More examples in text Pp. 328 ex. 7.8, 7.9 305 © D. Wong 2003 2002 Tuple-Based CHECK Constraints Applies to one or more attributes of a table Checked when a tuple is inserted or updated. Reject if it's a violating operation. Invisible to other relations (so, can be violated if the relations referenced in the condition are changed) E.g. CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, gender CHAR(1), CHECK (gender = 'F' OR name NOT LIKE'Ms.%') ); 306 © D. Wong 2003 2002 Modification of Constraints Only to constraints that are named. Naming of constraints examples: 1. snum char (5) CONSTRAINT pk_supplier PRIMARY KEY , 2. CONSTRAINT fk_spj_1 FOREIGN KEY (snum) REFERENCES S(snum) DEFERRABLE, 3. Gender CHAR(1) CONSTRAINT NoAndro CHECK (gender in ('F', 'M')), 307 © D. Wong 2003 2002 Modifications Delete constraints ALTER TABLE <tableName> DROP CONSTRAINT <constraintName>; Add constraint ALTER TABLE <tableName> ADD CONSTRAINT <constraintName> <constraint specification>; e.g. ALTER TABLE student ADD CONSTRAINT PK_student PRIMARY KEY (studentkey); Set constraint – to change a deferrable constraint from immediate to deferred and vise versa SET CONSTRAINT <constraintName> DEFERRED; 308 © D. Wong 2003 2002 Assertions An assertion is a schema level CHECK constraint. Exist independent of any particular table. It's condition can refer to any table in the db schema Declaring an assertion: CREATE ASSERTION <constraintName> CHECK <condition> [constraint attributes]; constraint attributes: [NOT] DEFERRABLE; INITIALLY IMMEDIATE; INITIALLY DEFERRED The constraint is violated if the condition is false Any db modifications that causes a violation will be rejected 309 © D. Wong 2003 2002 Triggers Available in Oracle and SQL99 Event-Condition-Action rules: define an action the db should take when some db-related events occurs Triggers are useful for: 1. To maintain complex integrity constraints 2. To audit changes made to table 3. To signal to other programs that changes were made to a table 310 © D. Wong 2003 2002 Trigger vs. other constraints 1. Triggers are awakened when certain events, specified by db programmer, occur. E.g. update, insert, delete to a relation; end of a transaction 2. Other constraints immediately prevent the event if the constraint is violated. For trigger, it's condition is tested when the event occurs, if the condition does not hold, then the action associated with the trigger will not happen (I.e. trigger will not be fired) 3. If the trigger condition is true, the action is performed by the DBMS (I.e. trigger fired). So, it's transparent to the user 311 © D. Wong 2003 2002 Trigger Syntax CREATE [OR REPLACE] TRIGGER <triggerName> {BEFORE | AFTER | INSTEADOF} {DELETE | INSERT | UPDATE [of column, …] } [OR {DELETE | INSERT | UPDATE [of column, …] } ON {tableName | viewName} [REFERENCING { OLD [AS] <oldName> , NEW [AS] <newName> …] FOR EACH {ROW | STATEMENT} [WHEN (condition) ] <action>; 312 © D. Wong 2003 2002 Oracle Triggers <action> is PL/SQL block Simplest is : BEGIN <SQL statement> END; In the PL/SQL action block, variables OLD and NEW are preceded by : e.g. :OLD Follow the create trigger statement with a Dot (.) and then RUN to store the definition in the db The action cannot change the relation that triggers the action, nor to relations connected to the triggering relation by a constraint e.g. FK constraint Read 7.4.2 – 7.4.4 313 © D. Wong 2003 2002 SQL3 Triggers <action> can be: 1. a single SQL statement 2. A SQL statements, separated by ; enclosed in a BEGIN <SQL statements> END; 314 © D. Wong 2003 2002 Constraints Summary 1. Primary Key declaration 2. Foreign Key – referential integrity constraint 3. Constraints within relations: Attribute constraints: 1. NOT NULL; 2. CHECK Tuple based CHECKs 4. 5. Schema level constraints – SQL2 assertions (not in Oracle) Triggers – Oracles's and SQL99's 315 © D. Wong 2003 2002 Ch. 8 - System Aspects of SQL SQL in programming environments – Statement Level Interface Embedded SQL, SQLJ Dynamic SQL – PSM Oracle’s PL/SQL – Call Level Interface SQL environment 316 © D. Wong 2003 2002 Host Language Interface Host language: conventional language that applications are written in. e.g C, Java, C#, Visual Basic .Net Statement level interface – Static: Embedded SQL, SQLJ – Dynamic SQL Call level interface (CLI) Use of CLI (e.g. JDBC) for static transaction is less efficient at run time than statement-level interfaces because CLI’s preparation and execution generally involve separate communications with the DBMS, which is costly. 317 © D. Wong 2003 2002 Embedded SQL Shared variables – Host language variables that can be read/written by SQL statement – Serves as interface between the host language and the SQL execution system – When used in embedded SQL statements, precede the variable with a : e.g. :name Embedded SQL statements are prefixed with EXEC SQL Preprocessor translate the EXEC SQL statements into function calls in host language, then compile, linked with SQL-related library to form executable code. Fig. 8.1 pp 351 318 © D. Wong 2003 2002 Embedded SQL (continued) SQLSTATE – a special variable (an array of 5 char) – Serves to connect the host-language program with the SQL execution system – updated each time a SQL library function is called – Some of the codes are: ‘00000’ = no error ‘02000’ = no tuple found 319 © D. Wong 2003 2002 Embeddable SQL statements Any SQL statement that does not return a result (I.e. not a select-from-where query) can be embedded. E.g. insert, delete, create, update. select-from-where queries are not embeddable directly. For connecting the result of queries with a host-language program, must use one of the following methods: 1. Use single-row select for query that produces a single tuple : select-into-from-where. Ref. fig. 8.3 pp 355 EXEC SQL SELECT A INTO :var FROM R WHERE <cond>; 2. Use CURSOR for queries producing more than one tuple Fig. 8.4 pp 357 320 © D. Wong 2003 2002 CURSOR - Ref. Fig. 7.4 pp.379 (Fig. 8.4 pp 357) An object used to store the output of a query for processing in an application. It provides the mechanism to reference the current position in a result set, and to do positioned updates and deletes. Declare cursor in embedded SQL by: EXEC SQL DECLARE <cursor> CURSOR FOR <query> Open cursor before use: EXEC SQL OPEN <cursor> Use Fetch statement to get the next tuple of the relation over which the cursor range EXEC SQLFETCH FROM <cursor> INTO <list of shared variables> SQLSTATE of ‘02000’ means no more tuples found To close a cursor when done: EXEC SQL CLOSE 321 © D. Wong 2003 2002 Cursor options INSENSITIVE guarantee that changes to underlying relation made between one opening and closing of cursor will not affect the set of tuples already fetched. FOR READ ONLY DBMS will prevent modifications to the underlying relation through this cursor SCROLL The cursor may be accessed with any one of the : FETCH {NEXT / PRIOR/ FIRST / LAST /RELATVE n/ ABSOLUTE n} ResultSet in JDBC is a cursor 322 © D. Wong 2003 2002 SQLJ ANSI standard Statement-level Interface to JAVA A dialect of embedded SQL that can be included in JAVA program Goal: To obtain the run-time efficiency of embedded SQL for Java applications while retaining the advantage of accessing DBMS through JDBC Embedded SQLJ constructs are replaced by calls to an SQLJ run-time package, which access database using JDBC. Benefit: the pre-compiler (translator in Oracle) can check SQL syntax and the number and types of arguments and results. 323 © D. Wong 2003 2002 Differences between SQLJ, Embedded SQL and JDBC 1. SQLJ supports essentially SQL-92, much more portable across DBMS vendors. For embedded SQL, each DBMS vendor supports its own proprietary version of SQL. 2. An SQLJ clause in a JAVA program begin with #SQL instead of EXEC SQL, and can contain select-from-where statement inside { } Any JAVA variables can be included as a parameter in an SQL statement (prefix the variable with : ) , same as in embedded SQL In SQLJ, a query returns an SQLJ interator object instead of a ResultSet object (as in JDBC). But, it’s similar to ResultSet in that they provide a cursor mechanism. Both SQLJ statement and JDBC calls can be included in the same Java program 3. 4. 5. 324 © D. Wong 2003 2002 Oracle’s SQLJ script To use: sqlj file.sqlj Many command lines options are availabe, refer to Oracle developer guide at http://otn.oracle.com/tech/java/sqlj_jdbc/pdf/a96655.pdf Invokes translator to preprocess java programs with SQLJ clauses (suffix of those files can be .sqlj, .java). Translator generates .java file which contains the JDBC calls for the SQL statements Invokes javac to compile Invokes JVM to execute 325 © D. Wong 2003 2002 Dynamic SQL Used when the SQL statement is not known at compile time E.g. an application that prompt the user for an SQL query, read, and then execute the query (can you think of an application?) Host language program must instruct SQL to: 1. Take a string and turn it into an executable SQL statement: EXEC SQL PREPARE sqlvar FROM sharedVar; 2. Execute the prepared statement: EXEC SQL EXECUTE sqlvar; 326 © D. Wong 2003 2002 Dynamic SQL (continued) Steps 1 and 2 can be combined by: EXEC SQL EXECUTE IMMEDIATE sharedVar; The 2 steps approach is beneficial if a prepared statement is execute multiple times. Ref. Fig. 8.7 pp. 368 327 © D. Wong 2003 2002 Week 14 328 © D. Wong 2003 2002 Persistent, Stored Modules (PSM) (Ref. 8.2 ) A way to create and store procedures or functions with a database schema The procedure/functions can be used in SQL queries or other SQL statements to perform within the database computations that cannot be expressed in SQL query language Like a simple, general-purpose language SQL/PSM standard (PSM-96). Commercial DBMS offers such capabilities with variations (e.g. Oracle, MS SQL) 329 © D. Wong 2003 2002 PSM Modules Collection of function and procedure definitions, temporary relation declarations, and other declarations. (PSM modules in Oracle are Packages) PSM Procedure CREATE PROCEDURE <name> (parameters) PSM Function CREATE FUNCTION <name> (parameters) local declarations return <type> procedure body; local declarations procedure body; Parameters: mode-name-type triples Modes: IN, OUT, INOUT 330 © D. Wong 2003 2002 Statement Forms in PSM 1. CALL –statement (for procedures only) CALL <procedure name> (<parameter list>); i. From a host-language program: e.g. EXEC SQL CALL proc(:x, 3); // x is a shared variable ii. As a statement of another PSM function or procedure iii. As an SQL command in an SQL client (e.g. SQLPLUS). E.g CALL proc (1, 3) 331 © D. Wong 2003 2002 Statement Forms in PSM (continued) 2. RETURN statement (for functions only) 3. Declaration of local variables (precede executable statements): DECLARE <name> <type>; 4. Assignment statement: SET <variable> = <expression>; 5. Statement group in BEGIN … END block 6. Statement label - 7. Branching statement (ref. Fig. 8.9, 8.10 ) labelName: 332 © D. Wong 2003 2002 Statement Forms in PSM (continued 2) 8. 9. Queries (select-from-where): Loops a) LOOP <statement list> END LOOP; b) Only to iterate over a cursor: FOR <loop name> AS <cursor name> CURSOR FOR <query> DO <statement list> END FOR c) WHILE <condition> DO <statement list> END WHILE; d) REPEAT <statement list> UNTIL <condition> END REPEAT; 333 © D. Wong 2003 2002 PL/SQL of Oracle Oracle’s procedural language (PL). It’s a superset of SQL. Used to create stored procedures/functions and packages, to trigger database events, or to add programming logic to SQL commands Ref. : Ullman and students’s introduction http://www-db.stanford.edu/~ullman/fcdb/oracle/or-plsql.html 334 © D. Wong 2003 2002 Call-level Interface (CLI) The application is written entirely in the host language. SQL statements are the values of string variables constructed at run time The string variables are passed as arguments to host language procedures provided by the CLI library of functions. No special syntax, no pre-compiler is needed. System independence – in principle! SQL 99's CLI standard is an adaptation of ODBC (Open Database Connectivity) and the host language is C SQL/CLI in Java is JDBC (discussed in week 9) 335 © D. Wong 2003 2002 The SQL Environment It's the framework in which data may exist and SQL operations on data may be executed. I.e. a DBMS running at an installation. SQL standard of elements hierarchy: Fig. 8.15 2nd ed 1. Schema – collection of tables, views, assertions, triggers, PSM, modules, etc. 2. Catalogs – collections of schemas 3. Clusters – collections of catalogs. Each user has an associated cluster. 336 © D. Wong 2003 2002 The SQL Environment (continued) SQL clients and servers – processes that runs of the same or different hosts. – Server : support operations on database elements – Client : allow a user to connect to a server and operate on a database (e.g. SQLPLUS) Connection – a "connection" must be made between the server and the client by executing the CONNECT statement on the client host. Default connection is made by an SQL client (e.g. SQLPLUS). Session – SQL operations that are performed while a connection is active. Each session has a current catalog, schema, and an authorized user. Reminder: temporary elements such as view exists only within the environment of a session. 337 © D. Wong 2003 2002 The SQL Environment (continued 2) Modules – An SQL term for an application program – 3 kinds of modules in SQL standard: 1. 2. 3. – Generic SQL interface, e.g. SQLPLUS Embedded SQL True modules, e.g. PSM SQL agent – a SQL module in execution Ref. Fig. 8.17 pp. 384 338 © D. Wong 2003 2002 SQL Transactions – Example in JDBC Security and User Authorization in SQL Object Oriented Model – ODL – OQL Object Relational Systems Recursion in SQL 339 © D. Wong 2003 2002 Transactions in SQL (8.6 - 2nd ed.) Concerns with enforcing the ACID properties of transactions Review: ACID properties of “proper” execution: – Atomicity : All of the updates of a transaction are successful, or no update take place – Consistency: Each transaction should leave the database in a consistent state – Isolation: Each transaction, when executed concurrently with other transactions, should have the same effect as if it had been executed by itself (serializable behavior) – Durability: Once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction. 340 © D. Wong 2003 2002 Serializability and Atomicity Serializable function execution: If the function executions behave as if they were run serially, even though their execution may overlap in time. Example of problem with concurrent function executions that are not serialized: Ex. 8.26, Fig. 8.22 and Fig. 8.23 pp. 398 Example of problem when failure (system, network, power) occurs during a database operation: Ex. 8.27, Fig. 8.24 pp. 400 341 © D. Wong 2003 2002 Transactions To solve the serializability and atomicity problem. Def.: A collection of one or more operations on the database that must be executed atomically, (i.e. either all are done or none are) and in a serializable manner (i.e. isolation). End of a SQL transaction is marked by either operations: 1. COMMIT - the database will be permanently updated 2. ROLLBACK – no changes by the transaction appear in the database (abort the transaction) 342 © D. Wong 2003 2002 Transactions (continued) In query interface (e.g. SQLPLUS): Transactions are single queries or modification statements. In program interface (e.g. JDBC): 1. Single SQL statement: use automatic commit mode. A transaction: starts when the statement begins execution ends when an automatic commit or rollback is completed 2. More than 1 SQL statement: – Disable auto-commit – After all SQL statements in a transaction are executed, call commit; or rollback if error occurs 343 © D. Wong 2003 2002 Deferred constraints When the commit method is executed, if there are deferred constraints that need to be checked, and these constraints are now violated, then the transaction is not committed. Instead, the transaction is rolled back and the error is reported via the SQLSTATE variable. In JDBC, this SQLSTATE is reported via the SQLException object's getSQLState() method. 344 © D. Wong 2003 2002 Code segment to show transaction control in JDBC: try { // ... add code here for making connection ... PreparedStatement updateSales = con.prepareStatement("update COFFEES " + "set SALES = ? where COF_NAME like ?"); PreparedStatement updateTotal = con.prepareStatement( "update COFFEES " + "set TOTAL = TOTAL + ? where COF_NAME like ?") ; con.setAutoCommit(false); updateSales.setInt(1, 50); updateSales.setString(2, "Colombian"); updateSales.executeUpdate(); updateTotal.setInt(1, 50); // first update updateTotal.setString(2, "Colombian"); updateTotal.executeUpdate(); // second update con.commit(); con.setAutoCommit(true); } catch(SQLException ex) { if (con != null) { try { System.err.print("Error detected! Transaction is being rolled back!"); con.rollback(); } catch(SQLException excep) { System.err.print("SQLException: " + excep.getMessage()); } } // end if there is a connection } 345 © D. Wong 2003 2002 Transaction Isolation Levels Determine whether and how a transaction will be affected by concurrent transactions – transactions from different SQL sessions that are accessing the same data. SQL isolation levels are to address: 1. Dirty Read: a read of dirty data. Dirty data is data written by a transaction that has not yet committed. 2. Non-repeatable Read: One transaction reads a row. Another deletes or modifies it and COMMIT before the first one does. Now, if the first transaction perform the same read again will get different result. 3. Phantom tuples: tuples that are inserted into the database by another transaction while one's transaction is running. 346 © D. Wong 2003 2002 Isolation levels in JDBC The 5 isolation levels: 1. TRANSACTION_NONE - transactions not supported (This level is not in the SQL standard) 2. TRANSACTION_READ_UNCOMMITTED - allows dirty reads, nonrepeatable reads, and phantom reads 3. TRANSACTION_READ_COMMITTED - prevents dirty reads, but does not prevent nonrepeatable reads, and phantom reads 4. TRANSACTION_REPEATABLE_READ - disallow dirty reads, nonrepeatable reads, but allow phantom reads 5. TRANSACTION_SERIALIZABLE - disallow dirty reads, nonrepeatable reads, and phantom reads (The highest level, the default) Connection methods to get and set them: 1. 2. getTransactionIsolation() - to find out the transaction isolation level the database is set setTransactionIsolation() - to set the appropriate isolation level 347 © D. Wong 2003 2002 Choosing Isolation Levels A choice of compromise between performance and a slightly inconsistent view of data (because data is being updated concurrently by other users). Ref. Ex. 8.32, 8.33 pp. 408 348 © D. Wong 2003 2002 DBMS Techniques to enforce ACID Locking – granularity of locks is important. Locks are obtained at the beginning of a transaction. Locks are released at the end of commit or rollback. Logging – write a log to nonvolatile storage. Assure durability. Transaction Commitment – for durability and atomicity, transactions are computed “tentatively”, recorded, but no changes are made to the db until the transaction gets committed. Changes are copied to the log, then copied to db. 349 © D. Wong 2003 2002 J2EE Transaction Management in Enterprise JavaBeans (EJBs) Ref. JavaTM 2 Enterprise Edition Developer's Guide 350 © D. Wong 2003 2002 Container-Managed Transactions (see note page of this slide) Description • Code does not include statements that begin and end the transaction • Immediately before an EJB method starts - transaction begins • Just before the method exits - commits • Each method can be associated with a single transaction Prohibited methods, e.g.: • commit, setAutoCommit, and rollback methods of java.sql.Connection Limitation: • When a method is executing, it can be associated with either a single transaction or no transaction at all 351 © D. Wong 2003 2002 Bean Managed Transaction (see note page of this slide) • Session bean code invokes methods that mark the boundaries of the transaction - setAutoCommit(); commit(); rollback(); • An entity bean may not have bean-managed transactions Example: public void ship (String productId, String orderId, int quantity) { try { con.setAutoCommit(false); updateOrderItem(productId, orderId); updateInventory(productId, quantity); con.commit(); } catch (Exception ex) { try { con.rollback(); throw new EJBException("Transaction failed: " + ex.getMessage()); } catch (SQLException sqx) { throw new EJBException("Rollback failed: " + sqx.getMessage()); }}} Ref. JavaTM 352 2 Enterprise Edition Developer's Guide, JDBC Transaction © D. Wong 2003 2002 Week 15 353 © D. Wong 2003 2002 Security and User Authorization in SQL 8.7 pp. 410 Authorization ID = user name Special authorization ID: PUBLIC Privileges for: SELECT, INSERT, UPDATE, DELETE, REFERENCE, USAGE, TRIGGER, EXECUTE, UNDER For SELECT, INSERT, UPDATE, may also specify on attribute level Privileges are needed for relations in the subqueries also. e.g. Fig. 8.25 pp 411 354 © D. Wong 2003 2002 Creating privileges Owner of schema or modules has all privileges Establish ownership at: 1. When a schema is created. 2. When a session is initiated by a CONNECT statement. e.g. CONNECT TO ABC_server AS conn1 AUTHORIZATION smith; 3. When a module is created, use an optional AUTHORIZATION clause 355 © D. Wong 2003 2002 Granting privileges Owner of a relation has GRANT privilege. If you have the "GRANT" privilege to a set of privileges, you may grant them to any user. GRANT <privilege list> ON <database element> TO <user list> [WITH GRANT OPTION] e.g. GRANT SELECT, INSERT ON Studio TO kirk, picard WITH GRANT OPTION; -- by Janeway GRANT SELECT, INSERT ON Studio TO sisko; -- by picard GRANT SELECT, INSERT(name) ON Studio TO sisko; -- by kirk Grant diagram e.g. Fig. 8.26 pp. 417 356 © D. Wong 2003 2002 Revoking Privileges Privileges can be revoked: REVOKE [GRANT OPTION FOR] <privilege list> ON <database element> FROM <user list> {CASCADE | RESTRICT} e.g. REVOKE SELECT, INSERT ON Studio FROM picard CASCADE ; If A has been given a privilege by several different people on the same element, then all of them have to revoke in order for A to lose the privilege If A granted privilege P to B, who granted P to C, then A revokes P from B will also revoke P from C. e.g. Fig 8.29 pp 420 357 © D. Wong 2003 2002 Object-Oriented Data Model ODMG – Object Database Management Group – Deals with OO standard for database – Also deals with ORDBMS (Object Relational DBMS) Major parts of ODMG standard: – ODL: Object Definition Language, how to specify the db schema – OQL: the SQL-like Object Query Language – Host language binding: how to use ODL and OQL from within procedural languages. The standard define bindings for C++, SmallTalk, and Java. In ODMG, the host language also serves as the object manipulation language. 358 © D. Wong 2003 2002 ODMG database management system Application is written in a host language e.g. C++, Java In order to access the db, the application must be linked with the ODBMS libraries and with the code that implements its class methods. Much of the code that manipulates objects is part of the database itself. Each class has a set of methods. Method signatures are specified in the schema using ODL. The code for these methods is stored on the database server. ODBMS invokes the appropriate code whenever a method is called. OODMG database data is modified directly in the host language e.g. Stud.Name = "Joe"; // Stud contains the oid of a // persistent Student object 359 © D. Wong 2003 2002 Architecture of an ODMG database Schema Spec. in ODL(Embedded in C++, Java, etc) ODL Preprocessor Source code for class methods in host language (C++, Java, …) Host language compiler ODBMS Software Method Implementation Obj. code ODBMS Libraries Linker Information stored at the Server Metadata Data Access Method Implementation Binaries Stored in DBMS Object Data 360 Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley © D. Wong 2003 2002 Structure of ODMG Applications Application source code in host language ODBMS Host language compiler ODBMS library Application Object code Method implementation binaries stored in DBMS Linker Executable code 361 Ref. "Databases and Transaction Processing" – Lewis, Addison Wesley © D. Wong 2003 2002 Object Definition Language (ODL) Conceptual model to describe the attributes, methods, and relationships of each object type (class), including it's inheritance properties. ODL classes describes 3 kinds of elements: 1. Attributes: values associated with the object 2. Relationship: connection between the object itself and other objects 3. Methods: functions that may be applied to objects of the class. Methods are specified by it's signature: name, arguments (names, order, and type), return value type, name of any exceptions it can raise. e.g. Fig. 4.2 pp137 362 © D. Wong 2003 2002 Object Definition Language (ODL) (continued) Class declaration Class include: 1. Class Name 2. Key declaration(s). Optional. 3. Extent Declaration = name for the set of currently existing objects of a class (I.e. relation instance in relational model) 4. Element declarations: attributes, relationships, methods class <name> [(extent names)] { < list of elements> } 363 © D. Wong 2003 2002 Object Definition Language (ODL) (continued 2) Attribute declaration (non-objects): attribute <type> <name>; e.g. 1 attribute string name; e.g. 2 attribute Struct Addr{ string street, string city} address; Relationship (and inverse relationship) declaration (objects): relationship [rangetype]<className> <name> inverse className::<relationship name>; e.g. relationship Set<Star> stars inverse Star::starredIn; 364 © D. Wong 2003 2002 Method declaration <returnType> <methodName> (arguments) raises (<exception>); e.g. 1: void lengthInhours() raises (noLengthFound); e.g. 2: void starName(out Set<String>) ; Arguments: in : read-only out: for returning values inout: for both 365 © D. Wong 2003 2002 ODL Relationships Only binary relationships supported – Use a connecting class to represent multiway relationships Fig. 2.9 pp. 34. Relationships are defined in inverse pairs. Fig. 4.3 pp 140 1. Many-many: have a set type of class in each direction 2. Many-one: a set type for the one, and a simple class name for the many 3. One-one: simple class name in both 366 © D. Wong 2003 2002 Subclass (S is a subclass of D) Class C extends D { class C's declarations } e.g. class Cartoon extends Movie { relationship Set<Star> voices; } Multiple inheritance (separate the super classes by : in the extend declaration) e.g. class CartoonMurderMystery extends MurderMystery : Cartoon Name conflict resolutions with Multiple inheritance pp. 151 367 © D. Wong 2003 2002 ODL data types Basis: 1. Atomic type: integer, float, characters, string, boolean, enum 2. Class names Structured types: 1. Set: Set<T> // finite sets of elements of type T 2. 3. 4. 5. Bag: Bag<T> // finite bags of element type T List: List<T> // finite lists of 0 or more elements T Array: Array <T, i> // T = type, i = no. of elements Dictionary: Dictionary <T, S>, T is key type, S is range type. Each pair has unique key value. 6. Structures : Struct N {<type1> field1, …} 368 © D. Wong 2003 2002 Keys declaration in ODL Optional because each object is identified by an internal OID May declare one or more keys in the extent declaration e.g. class Movie (extent Movies key (title, year)) { attribute string title; attribute integer year; … } 369 © D. Wong 2003 2002 ODL to Relational Design Invent a new attribute to serve as key when there is no key in the ODL design ODL attributes that are not atomic are converted into relation attributes that usually are redesigned with normalization Methods are not converted to relational design. But can have methods in Object Relational design 370 © D. Wong 2003 2002 Object-Relational DB (ORDB) SQL-99 adopted a limited subset of the object relational model ORDBMS is a conservative extension to the existing RDBMS. In general, ORDB consists of: – A set of relations (which can be viewed as classes) – Each relation consists of a set of tuples (which can be viewed as instances of the class that represents the relation) – Each tuple is of the form (oid, val) where oid is an object id and val is a tuple value whose components can be arbitrary values (e.g. primitive values, sets of tuples, and references to other objects) 371 © D. Wong 2003 2002 ORDB, ODB, RDB Difference between ORDB and ODB – In ORDB, the top-level structure of each object instance is always a tuple. In ODB, top-level structure can be an arbitrary value. Difference between ORDB and RDB: – RDB tuple components must be primitive values – ORDB tuple components can be arbitrary values 372 © D. Wong 2003 2002 Oracle Object example create type ADDRESS_TY as object (Street VARCHAR2(50), City VARCHAR2(25), State CHAR(2), Zip NUMBER); create type PERSON_TY as object (Name VARCHAR2(25), BirthDate DATE; Address ADDRESS_TY member function AGE_DAYS (BirthDate IN DATE) return NUMBER); 373 © D. Wong 2003 2002 Oracle Object example (continued) Defining methods for user defined types using PL/SQL: Create type body PERSON_TY as Member function AGE_DAYS (BirthDate DATE) return NUMBER is begin RETURN ROUND(SysDate – BirthDate); end; -- if there are more methods to the data type, may define here end; / 374 © D. Wong 2003 2002 Oracle Object example (continued 2) Create table with user defined abstract data types: create table CUSTOMER (Customer_ID NUMBER, Person PERSON_TY); Use constructors for inserting data: insert into CUSTOMER values (1, PERSON_TY('Joe Smith', '01JAN-90', ADDRESS_TY('10 Spring ST', 'BHM', 'AL', 35110))); Use path names to access the attributes: SELECT Person.Address.Street FROM CUSTOMER; SELECT Person.AGE_DAYS(Person.BirthDate) FROM CUSTOMER; UPDATE CUSTOMER SET Person.Address.City = 'Birmingham' WHERE Person.Address.City = 'BHM'; 375 © D. Wong 2003 2002 Object-Orient Analysis and Design Normalization in relational model relates each attribute to its primary key e.g. The following is in 3NF: create table CUSTOMER (Customer_ID NUMBER, Name VARCHAR2(25), BirthDate DATE; Street VARCHAR2(50), City VARCHAR2(25), State CHAR(2), ZipNUMBER); For OO, further group related columns into abstract data types (ADT) (e.g. ADDRESS_TY) for reuse. Then look for relationships among ADTs to determine if nesting is appropriate (e.g. PERSON_TY); 376 © D. Wong 2003 2002