* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Overview of Databases I
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational algebra wikipedia , lookup
Databases II (Fall 2009) Professor: Iluju Kiringa [email protected] http://www.site.uottawa.ca/~kiringa SITE 5072 1 Review of Databases I Chapters 2-5 2 What is a DBMS? A centralized database is a large collection of integrated data. Organizations face large quantities of data that must be efficiently managed. Many store GBs, even TBs of data Some scientific applications store PBs of data ! A Database Management System (DBMS) is a software package designed to store and manage databases. 3 Content of « Databases I » Foundations of DBMSs (Chap. 2-5, 19): : Conceptual design - Chap. 2: • Input: requirements of an application • Output: ER diagrams (i.e., ER model) Logical design – Chap. 3: • Input: ER diagrammes • Output: Relational model Normalization – Chap. 19: • Input: Relational model • Output: Relational model in normal forms Relational algebra and calculus – Chap. 4 SQL – Chap. 5 Application development (Chap. 6-7) Classic applications: Embeded SQL, JDBC, SQLJ, stored procedures Internet applications : HTTP, HTML, XML, 3-tier architecture 4 Content of « Databases I » (Cont’d) Storage and Indexing (Chap. 8-11): Disks and files – Chap. 9: • Memory hierarchy • Disk and file management Tree index – Chap. 10: • ISAM tree • B+ tree Hasch index – Chap. 11: • static • extendible • linear 5 Content of « Databases II » Query evaluation (Chap. 12-15) External sort - Chap. 13 Evaluation of relational operators -- Chap. 14 System R: a sample evaluation algorithm – Chap. 15 Transactions (Chap. 16-18) Concurrency control – Chap. 17 Recovery – Chap. 18 Advanced topics (Chap. 22,25, 26, 23 ou 27) Distributed databases – Chap. 22 Data Warehousing – Chap. 25 Data Mining Chap. 26 6 Overview of Database Design Requirements analysis: Find out what users want to do with the database Conceptual design: Use the output of RA to develop a high-level description of the data to be stored, along with their constraints. Output of CD usually is an ER-diagram. Logical design: Choose a DBMS and map the conceptual schema (ER-diagram) into the data model of the DBMS. Output of this step is the logical schema of the data. Schema refinement: Analyze the logical schema, identify potential problems in it, and fix these by refining the logical schema using known normal forms. Physical design: Consider expected workloads that the database will support to further refine the design in order to meet desired performance criteria. Output here is the physical schema. 7 ER Model The model Basic elements: • • • Advanced elements: • • • • Entity: real-world object Relationship: association between entities Attributes: description of an entity or a relationship Constraints: qualification of a relationship Weak entity : Identifiable only via another entity Hierarchy : similar to OO-languages Agregation: sort of macros Conceptual design using the ER model Should a given concept be modelled as an entity or an attribute? Should a given concept be modelled as an entity or a relationship? Should a given concept be modelled as a binary or as a ternary relationship? etc 8 Relational Database Concepts Relation: made up of 2 parts: Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity. Schema : specifies name of relation, plus name and domain (type) of each column (attribute). Can think of a relation as a set of rows or tuples (i.e., all rows are distinct), where each tuple has the same arity as the relation schema. Relational database: a set of relations, each with distinct name. Relational DB schema: set of schemas of relations in the DB. Relational DB instance: set of relation instances in the DB. 9 Sample Relation Schema : Instance : Students(sid: string, name: string, login: string, age: integer, gpa: real). sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 gpa 3.4 3.2 3.8 Cardinality = 3, arity = 5, all rows distinct. Commercial systems allow duplicates. Order of attributes may or may not matter! Do all columns in a relation instance have to be distinct? Depends on whether they are ordered or not. 10 Creation and Alteration of Relations CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL) DROP TABLE Students CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2)) ALTER TABLE Students ADD COLUMN firstYear: integer INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) DELETE FROM Students S WHERE S.name = ‘Smith’ 11 Primary Key and Foreign Key in SQL CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade) ) CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) ) CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students ) 12 Logical DB Design: ER to Relational The ER model represent the initial, high-level database design. The task is to generate a relational database schema that is as close as possible to the ER model. The mapping is approximate since it is hard to translate all the constraints of the ER model into an efficient logical (relational) model. 13 Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it (Nonoperational, declarative). 14 Relational Algebra Basic operations: Additional operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Gives tuples in 1st rel., but not in 2nd rel. Union ( ) Gives tuples in rel. 1 and in rel. 2. Intersection, join, division, renaming: Not (theoretically) essential, but (practically) very useful. Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) 15 Cross-Product Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. Conflict: Both S1 and R1 have a field called sid. (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 Renaming operator: (C(1 sid1, 5 sid 2), S1 R1) 16 Joins Condition Join: (sid) 22 31 R c S c ( R S) sname rating age dustin 7 45.0 lubber 8 55.5 S1 (sid) bid 58 103 58 103 S1. sid R1. sid day 11/12/96 11/12/96 R1 Result schema same as that of cross-product. Fewer tuples than cross-product, might be able to compute more efficiently Sometimes called a theta-join. 17 Joins (Cont’d) Equi-Join: A special case of condition join where the condition c contains only equalities. sid sname rating age bid day 22 dustin 7 45.0 101 10/10/96 58 rusty 10 35.0 103 11/12/96 S1 R1 sid Result schema similar to cross-product, but only one copy of fields for which equality is specified. Natural Join: Equijoin on all fields having the same name in both relations. 18 Relational Calculus Has two flavors: Tuple relational calculus (TRC) Domain relational calculus (DRC). Has variables, constants, comparison ops, logical connectives, and quantifiers. TRC: Variables range over (i.e., get bound to) tuples. DRC: Variables range over domain elements (= field values). Both TRC and DRC are simple subsets of first-order logic. Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true. 19 Tuple Relational Calculus Query has the form: t| p(t) Answer includes all tuples t that make the formula p(t) be true. Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger formulas using the logical connectives. 20 TRC Formulas Atomic formula: RRname, op is one of or R.a op S.b, or R.a op constant , , , ,, Formula: an atomic formula, or p, p q, p q , where p and q are formulas, or R( p(R)), where variable R is free in p(R), or R( p(R)) , where variable R is free in p(R) The use of quantifiers R and R is said to bind R. A variable that is not bound is free. 21 Overview: Features of SQL Data definition language: used to create, destroy, and modify tables and views. Data manipulation language: used to pose queries, and to insert, delete, and modify rows. Triggers and advanced integrity constraints: used to specify actions that the DBMS will execute automatically. Embeddded SQL: allows SQL to be called from a host language, or Dynamic SQL: allows run-time creation and execution of queries . 22 Syntax of Basic SQL Query SELECT FROM WHERE [DISTINCT] target-list relation-list qualification relation-list A list of relation names (possibly with a range-variable after each name). target-list A list of attributes of relations in relation-list qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of , , , , , ) combined using AND, OR and NOT. DISTINCT is an optional keyword indicating that the answer should not contain duplicates. 23 Sample Query and Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103 (sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 24 Aggregate Operators Significant extension of relational algebra SELECT COUNT (*) FROM Sailors S SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 COUNT (*) COUNT ( [DISTINCT] A) SUM ( [DISTINCT] A) AVG ( [DISTINCT] A) MAX (A) MIN (A) single column SELECT S.sname FROM Sailors S WHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2) SELECT COUNT (DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’ SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 25 Queries With GROUP BY and HAVING SELECT FROM WHERE GROUP BY HAVING [DISTINCT] target-list relation-list qualification grouping-list group-qualification The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g., MIN (S.age)). The attribute list (i) must be a subset of grouping-list. Intuitively, each answer tuple corresponds to a group, and these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.) 26 Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’. 2nd column of result is unnamed. (Use AS to name it.) sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 71 zorba 10 16.0 64 horatio 7 35.0 29 brutus 1 33.0 58 rusty 10 35.0 rating age 1 33.0 7 45.0 rating 7 35.0 7 35.0 8 55.5 Answer relation 10 35.0 27 Summary SQL is more declarative than earlier, procedural query languages. is relationally complete; in fact, significantly more expressive power than relational algebra. In practice, users need to be aware of how queries are optimized and evaluated for best results. has any alternative ways to write a query; optimizer should look for most efficient evaluation plan. allows specification of rich integrity constraints. Provides triggers to respond to changes in the database. 28