* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2_Managing external data_2
Survey
Document related concepts
Commitment ordering wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Serializability wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Transcript
SQL and SQAPL Data Models • A Database models some portion of the real world. • Data Model is link between user’s view of the world and bits stored in computer. • We will concentrate on the Relational Model Data Models • A data model is a collection of concepts for describing data. • A database schema is a description of a particular collection of data, using a given data model. • The relational model of data is the most widely used model today. – Main concept: relation, basically a table with rows and columns. – Every relation has a schema, which describes the columns, or fields. Levels of Abstraction • Views describe how users see the data. • Conceptual schema defines logical structure • Physical schema describes the files and indexes used. • (sometimes called the ANSI/SPARC model) Users View 1 View 2 View 3 Conceptual Schema Physical Schema DB Data Independence • A Simple Idea: Applications should be insulated from how data is structured and stored. • Logical data independence: Protection from changes in logical structure of data. • Physical data independence: Protection from changes in physical structure of data. View 1 View 2 View 3 Conceptual Schema Physical Schema DB SQL SQL consists of the following parts: • • • • • • • • Data Definition Language (DDL) Interactive Data Manipulation Language (Interactive DML) Embedded Data Manipulation Language (Embedded DML) Views Integrity Transaction Control Authorization Catalog and Dictionary Facilities AIRPORT airportcode name city state FLT-SCHEDULE flt# airline dtime from-airportcode atime to-airportcode miles FLT-WEEKDAY flt# weekday FLT-INSTANCE flt# date plane# #avail-seats AIRPLANE plane# plane-type total-#seats CUSTOMER cust# first middle last phone# street city RESERVATION flt# date cust# seat# check-in-status ticket# state zip price DDL - Overview • • • • primitive types domains schema tables DDL - Primitive Types • numeric (or INT), SMALLINT are subsets of the integers (machine dependent) REAL, DOUBLE PRECISION are floating-point and double-precision floating-point (machine dependent) FLOAT(N) is floating-point with at least N digits DECIMAL(P,D) (or DEC(P,D), or NUMERIC(P,D)), with P digits of which D are to the right of the decimal point – INTEGER – – – DDL - Primitive Types (cont.) • character-string – CHAR(N) (or CHARACTER(N)) – is a fixed-length character string VARCHAR(N) (or CHAR VARYING(N), or CHARACTER VARYING(N)) is a variable-length character string with at most N characters • bit-strings – BIT(N) is – a fixed-length bit string VARBIT(N) (or BIT VARYING(N)) is a bit string with at most N bits DDL - Primitive Types (cont.) • time – – – is a date: YYYY-MM-DD TIME, a time of day: HH-MM-SS TIME(I), a time of day with I decimal fractions of a second: HH-MM-SS-F....F – TIME WITH TIME ZONE, a time with a time zone added: HH-MM-SS-HH-MM DATE DDL - Primitive Types (cont.) – TIME-STAMP, date, time, fractions of a second and an optional WITH TIME ZONE qualifier: YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} – INTERVAL, relative value used to increment or decrement DATE, TIME, or TIMESTAMP: YEAR/MONTH or DAY/TIME DDL - Domains • a domain can be defined as follows: CREATE DOMAIN AIRPORT-CODE CHAR(3); CREATE DOMAIN FLIGHTNUMBER CHAR(5); • using domain definitions makes it easier to see which columns are related • changing a domain definition one place changes it consistently everywhere it is used • default values can be defined for domains • constraints can be defined for domains (later) DDL - Domains (cont.) • all domains contain the value, NULL. • to define a different default value: CREATE DOMAIN AIRPORT-CODE CHAR(3) DEFAULT ‘<literal>’; CREATE DOMAIN AIRPORT-CODE CHAR(3) DEFAULT ‘niladic function’; • literal, such as ‘???’, ‘NO-VALUE’,... • niladic function, such as USER, CURRENT-USER, SESSIONUSER, SYSTEM-USER, CURRENT-DATE, CURRENT-TIME, CURRENTTIMESTAMP • defaults defined in a column takes precedence over the above DDL - Domains (cont.) • a domain is dropped as follows: DROP DOMAIN AIRPORT-CODE RESTRICT; DROP DOMAIN AIRPORT-CODE CASCADE; • restrict: drop operation fails if the domain is used in column definitions • cascade: drop operation causes columns to be defined directly on the underlying data type DDL - Schema • create a schema: CREATE SCHEMA AIRLINE AUTHORIZATION LEO; • the schema AIRLINE has now been created and is owner by the user “LEO” • tables can now be created and added to the schema DDL - Schema • to drop a schema: DROP SCHEMA AIRLINE RESTRICT; DROP SCHEMA AIRLINE CASCADE; • restrict: drop operation fails if schema is not empty • cascade: drop operation removes everything in the schema DDL - Tables • to create a table in the AIRLINE schema: CREATE TABLE AIRLINE.FLT-SCHEDULE (FLT# FLIGHTNUMBER NOT NULL, AIRLINE VARCHAR(25), FROM-AIRPORTCODE AIRPORT-CODE, DTIME TIME, TO-AIRPORTCODE AIRPORT-CODE, ATIME TIME, PRIMARY KEY (FLT#), FOREIGN KEY (FROM-AIRPORTCODE) REFERENCES AIRPORT(AIRPORTCODE), FOREIGN KEY (TO-AIRPORTCODE) REFERENCES AIRPORT(AIRPORTCODE)); DDL - Tables (cont.) CREATE TABLE AIRLINE.FLT-WEEKDAY (FLT# FLIGHTNUMBER NOT NULL, WEEKDAY CHAR(2), UNIQUE(FLT#, WEEKDAY), FOREIGN KEY (FLT#) REFERENCES FLTT-SCHEDULE(FLT#)); CREATE TABLE AIRLINE.FLT-INSTANCE (FLT# FLIGHTNUMBER NOT NULL, DATE DATE NOT NULL, #AVAIL-SEATS SMALLINT, PRIMARY KEY(FLT#, DATE), FOREIGN KEY FLT# REFERENCES FLT-SCHEDULE(FLT#)); DDL - Tables (cont.) CREATE TABLE AIRLINE.RESERVATION (FLT# FLIGHTNUMBER NOT NULL, DATE DATE NOT NULL, CUST# INTEGER NOT NULL, SEAT# CHAR(4), CHECK-IN-STATUS CHAR, UNIQUE(FLT#, DATE, CUST#), FOREIGN KEY (FLT#) REFERENCES FLT-INSTANCE(FLT#), FOREIGN KEY (DATE) REFERENCES FLT-INSTANCE(DATE), FOREIGN KEY (CUST#) REFERENCES CUSTOMER(CUST#)); DDL - Tables (cont.) • to drop a table: DROP TABLE RESERVATION RESTRICT; DROP TABLE RESERVATION CASCADE; • restrict: drop operation fails if the table is referenced by view/constraint definitions • cascade: drop operation removes referencing view/constraint definitions DDL - Tables (cont.) • to add a column to a table: ALTER TABLE AIRLINE.FLT-SCHEDULE ADD PRICE DECIMAL(7,2); • if no DEFAULT is specified, the new column will have NULL values for all tuples already in the database DDL - Tables (cont.) • to drop a column from a table ALTER TABLE AIRLINE.FLT-SCHEDULE DROP PRICE RESTRICT (or CASCADE); • restrict: drop operation fails if the column is referenced • cascade: drop operation removes referencing view/constraint definitions Case • A dataset consisting of 5 tables are available as .csv files or as an excel file • Let us get this data into APL • Getdata • Next let us create the database using SQAPL • First you must create a database in Access • SQAPL Interactive DML - Overview • • • • • • • • • • • • • select-from-where select clause where clause from clause tuple variables string matching ordering of rows set operations built-in functions nested subqueries joins recursive queries insert, delete, update Interactive DML - select-from-where SELECT A1, A2, ... An FROM R1 , R2 , ... Rm WHERE P • the SELECT clause specifies the columns of the result • the FROM clause specifies the tables to be scanned in the query • the WHERE clause specifies the condition on the columns of the tables in the FROM clause Interactive DML - select clause • “Find the airlines in FLT-SCHEDULE” SELECT AIRLINE FROM FLT-SCHEDULE; SELECT ALL AIRLINE FROM FLT-SCHEDULE; • “Find the airlines in FLT-SCHEDULE with duplicates removed” SELECT DISTINCT AIRLINE FROM FLT-SCHEDULE; Interactive DML - select clause • “Find all columns in FLT-SCHEDULE” SELECT * FROM FLT-SCHEDULE; • “Find FLT# and price raised by 10%” SELECT FLT#, PRICE*1.1 FROM FLT-SCHEDULE; Interactive DML - where clause • “Find FLT# and price in FLT-SCHEDULE for flights out of Atlanta” SELECT FLT#, PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL”; Interactive DML - from clause • “Find FLT#, WEEKDAY, and FROMAIRPORTCODE in FLT-WEEKDAY and FLTSCHEDULE” SELECT FLT-SCHEDULE.FLT#, WEEKDAY, FROM-AIRPORTCODE FROM FLT-WEEKDAY, FLT-SCHEDULE WHERE FLT-WEEKDAY.FLT# = FLT-SCHEDULE.FLT#; • dot-notation disambiguates FLT# in FLTWEEKDAY and FLT-SCHEDULE Interactive DML - tuple variables • alias definition: SELECT S.FLT#, WEEKDAY, T.FROM-AIRPORTCODE FROM FLT-WEEKDAY S, FLT-SCHEDULE T WHERE S.FLT# = T.FLT#; • S and T are tuple variables Interactive DML - tuple variables • SQL’s heritage as a tuple calculus language shows • tuple variables are useful when one relation is used “twice” in a query: SELECT S.FLT# FROM FLT-SCHEDULE S, FLT-SCHEDULE T WHERE S.PRICE > T.PRICE AND T.FLT# = “DL212”; Interactive DML - string matching • wildcard searches use: %: matches any substring _: matches any character SELECT S.FLT#, WEEKDAY FROM FLT-WEEKDAY S, FLT-SCHEDULE T WHERE S.FLT# = T.FLT# AND T.AIRLINE LIKE “%an%”; • “%an%” matches American, Airtran, Scandinavian, Lufthansa, PanAm... • “A%” matches American, Airtran, ... • “ %” matches any string with at least three characters Interactive DML - ordering of rows • the order by clause orders the rows in a query result in ascending (asc) or descending (desc) order • “Find FLT#, airline, and price from FLTSCHEDULE for flights out of Atlanta ordered by ascending airline and descending price:” SELECT FLT#, AIRLINE, PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL” ORDER BY AIRLINE ASC, PRICE DESC; Interactive DML - set operations S T T S S union T • “Find FLT# for flights on Tuesdays in FLT-WEEKDAY and FLT# with more than 100 seats in FLT-INSTANCE ” SELECT FLT# FROM FLT-WEEKDAY WHERE WEEKDAY = “TU” UNION SELECT FLT# FROM FLT-INSTANCE WHERE #AVAIL-SEATS > 100; • UNION ALL preserves duplicates Interactive DML - set operation S T S T S intersect T • “Find FLT# for flights on Tuesdays in FLT-WEEKDAY with more than 100 seats in FLT-INSTANCE” SELECT FLT# FROM FLT-WEEKDAY WHERE WEEKDAY = “TU” INTERSECT SELECT FLT# FROM FLT-INSTANCE WHERE #AVAIL-SEATS > 100; • INTERSECT ALL preserves duplicates Interactive DML - set operation S\T S T S minus T • “Find FLT# for flights on Tuesdays in FLT-WEEKDAY except FLT# with more than 100 seats in FLT-INSTANCE” SELECT FLT# FROM FLT-WEEKDAY WHERE WEEKDAY = “TU” EXCEPT SELECT FLT# FROM FLT-INSTANCE WHERE #AVAIL-SEATS > 100; • EXCEPT ALL preserves duplicates Interactive DML - built-in functions • count (COUNT), sum (SUM), average (AVG), minimum (MIN), maximum (MAX) • “Count flights scheduled for Tuesdays from FLT-WEEKDAY” SELECT COUNT( *) FROM FLT-WEEKDAY WHERE WEEKDAY = “TU”; • “Find the average ticket price by airline from FLT-SCHEDULE” SELECT AIRLINE, AVG(PRICE) FROM FLT-SCHEDULE GROUP BY AIRLINE; Interactive DML - built-in functions • “Find the average ticket price by airline for scheduled flights out of Atlanta for airlines with more than 5 scheduled flights out of Atlanta from FLT-SCHEDULE” SELECT AIRLINE, AVG(PRICE) FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE = “ATL” GROUP BY AIRLINE HAVING COUNT (FLT#) >= 5; • “Find the highest priced flight(s) out of Atlanta from FLTSCHEDULE” SELECT FLT#, MAX(PRICE) FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE = “ATL”; Interactive DML - nested subqueries • Set membership: IN, NOT IN • “Find airlines from FLT-SCHEDULE where FLT# is in the set of FLT#’s for flights on Tuesdays from FLT-WEEKDAY” SELECT DISTINCT AIRLINE FROM FLT-SCHEDULE WHERE FLT# IN (SELECT FLT# FROM FLT-WEEKDAY WHERE WEEKDAY = “TU”); Interactive DML - nested subqueries • “Find FLT#’s for flights on Tuesdays or Thursdays from FLT-WEEKDAY” SELECT DISTINCT FLT# FROM FLT-WEEKDAY WHERE WEEKDAY IN (“TU”, “TH”); Interactive DML - nested subqueries • “Find FLT# for flights from Atlanta to Chicago with a price that is lower than all flights from Birmingham to Chicago” SELECT FLT# FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL” AND TO-AIRPORTCODE=“CHI” AND PRICE < ALL (SELECT PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“BIR” AND TO-AIRPORTCODE=“CHI”); Interactive DML - joins • cross join: Cartesian product • [inner] join: only keeps rows that satisfy the join condition • left outer join: keeps all rows from left table; fills in nulls as needed • right outer join: keeps all rows from right table; fills in nulls as needed • full outer join: keeps all rows from both tables; fills in nulls as needed • natural or on-condition must be specified for all inner and outer joins • natural: equi-join on columns with same name; one column preserved Interactive DML - joins • “Find all two-leg, one-day trips out of Atlanta; show also a leg-one even if there is no connecting legtwo the same day” SELECT X.FLT# LEG-ONE, Y.FLT# LEG-TWO FROM ((FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) X LEFT OUTER JOIN (FLT-SCHEDULE NATURAL JOIN FLT-INSTANCE) Y ON (X.TO-AIRPORTCODE=Y.FROM-AIRPORTCODE AND X.DATE=Y.DATE AND X.ATIME<Y.DTIME)) WHERE X.FROM-AIRPORTCODE=“ATL”; Interactive DML- recursive queries • not in SQL2; maybe in SQL3...(?) • “Find all reachable airports for multi-leg trips out of Atlanta” WITH PAIRS AS SELECT FROM-AIRPORTCODE D, TO-AIRPORTCODE A FROM FLT-SCHEDULE, RECURSIVE REACHES(D, A) AS /*initially empty*/ PAIRS UNION (SELECT PAIRS.D, REACHES.A FROM PAIRS, REACHES WHERE PAIRS.A=REACHES.D) SELECT A FROM REACHES WHERE D=“ATL”; Interactive DML - insert, delete, update INSERT INTO FLT-SCHEDULE VALUES (“DL212”, “DELTA”, 11-15-00, “ATL”, 13-05-00, ”CHI”, 650, 00351.00); INSERT INTO FLT-SCHEDULE(FLT#,AIRLINE) VALUES (“DL212”, “DELTA”); /*default nulls added*/ • “Insert into FLT-INSTANCE all flights scheduled for Thursday, 9/10/98” INSERT INTO FLT-INSTANCE(FLT#, DATE) (SELECT S.FLT#, 1998-09-10 FROM FLT-SCHEDULE S, FLT-WEEKDAY D WHERE S.FLT#=D.FLT# AND D.WEEKDAY=“TH”); Interactive DML - insert, delete, update “Cancel all flight instances for Delta on 9/10/98” DELETE FROM FLT-INSTANCE WHERE DATE=1998-09-10 AND FLT# IN (SELECT FLT# FROM FLT-SCHEDULE WHERE AIRLINE=“DELTA”); Interactive DML- insert, delete, update “Update all reservations for customers on DL212 on 9/10/98 to reservations on AA121 on 9/10/98” UPDATE RESERVATION SET FLT#=“AA121” WHERE DATE=1998-09-10 AND FLT#=“DL212”; Embedded DML - Overview • • • • • • • host languages precompilation impedance mismatch database access cursor types fetch orientation exception handling Embedded DML - host languages • SQL doesn’t do iteration, recursion, report printing, user interaction, and SQL doesn’t do Windows • SQL may be embedded in host languages, like COBOL, FORTRAN, MUMPS, PL/I, PASCAL, ADA, C, C++, JAVA • Or used from languages like APL Embedded DML - impedance mismatch • SQL is a powerful, set-oriented, declarative language • SQL queries return sets of rows • host languages cannot handle large sets of structured data • cursors resolve the mismatch: • Demo Views - definition, use, update • a view is a virtual table • how a view is defined: CREATE VIEW ATL-FLT AS SELECT FLT#, AIRLINE, PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE = “ATL”; • how a query on a view is written: SELECT * FROM ATL-FLT WHERE PRICE <= 00200.00; Views - definition, use, update • how a query on a view is computed: SELECT FLT#, AIRLINE, PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL” AND PRICE<00200.00; • how a view definition is dropped: DROP VIEW ATL-FLT [RESTRICT|CASCADE]; Views - definition, use, update • views inherit column names of the base tables they are defined from • columns may be explicitly named in the view definition • column names must be named if inheriting them causes ambiguity • views may have computed columns, e.g. from applying built-in-functions; these must be named in the view definition Views - definition, use, update these views are not updatable CREATE VIEW ATL-PRICES AS SELECT AIRLINE, PRICE FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL”; CREATE VIEW AVG-ATL-PRICES AS SELECT AIRLINE, AVG(PRICE) FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL” GROUP BY AIRLINE; this view is theoretically updatable, but cannot be updated in SQL CREATE VIEW FLT-SCHED-AND-DAY AS SELECT S.*, D.WEEKDAY FROM FLT-SCHEDULE S, FLT-WEEKDAY D WHERE D.FLT# = S.FLT#; Views - definition, use, update a view is updatable if and only if: • it does not contain any of the keywords JOIN, UNION, INTERSECT, EXCEPT • it does not contain the keyword DISTINCT • every column in the view corresponds to a uniquely identifiable base table column • the FROM clause references exactly one table which must be a base table or an updatable view • the table referenced in the FROM clause cannot be referenced in the FROM clause of a nested WHERE clause • it does not have a GROUP BY clause • it does not have a HAVING clause updatable means insert, delete, update all ok Views - definition, use, update CREATE VIEW LOW-ATL-FARES /*updatable view*/ AS SELECT * FROM FLT-SCHEDULE WHERE FROM-AIRPORTCODE=“ATL” AND PRICE<00200.00; UPDATE LOW-ATL-FARES /*moves row */ SET PRICE = 00250.00 /* outside the view*/ WHERE TO-AIRPORTCODE = “BOS”; INSERT INTO LOW-ATL-FARES /*creates row */ VALUES (“DL222”, ”DELTA”, /*outside the view*/ ”BIR”, 11-15-00, ”CHI”, 13-05-00, 00180.00); Integrity - constraints • constraint: a conditional expression required not to evaluate to false • a constraint cannot be created if it is already violated • a constraint is enforced from the point of creation forward • a constraint has a unique name • if a constraint is violated its name is made available to the user • constraints cannot reference parameters or host variables; they are application independent • data type checking is a primitive form of constraint Integrity - domain constraints • associated with a domain; applies to all columns defined on the domain CREATE DOMAIN WEEKDAY CHAR(2) CONSTRAINT IC-WEEKDAY CHECK (VALUE IN ( “MO”, “TU”, “WE”, “TH”, “FR”, “SA”, “SU”)); CREATE DOMAIN PRICE DECIMAL(7,2) CONSTRAINT IC-PRICE CHECK (VALUE > 00000.00 ); CREATE DOMAIN FLT# CHAR(5) CONSTRAINT IC-FLT# CHECK (VALUE NOT NULL); Integrity - base table, column constraints • associated with a specific base table CREATE TABLE AIRLINE.FLT-SCHEDULE (FLT# FLIGHTNUMBER NOT NULL, AIRLINE VARCHAR(25), FROM-AIRPORTCODE AIRPORT-CODE, DTIME TIME, TO-AIRPORTCODE AIRPORT-CODE, ATIME TIME, CONSTRAINT FLTPK PRIMARY KEY (FLT#), CONSTRAINT FROM-AIRPORTCODE-FK FOREIGN KEY (FROM-AIRPORTCODE) REFERENCES AIRPORT(AIRPORTCODE) ON DELETE SET NULL ON UPDATE CASCADE, FOREIGN KEY (FROM-AIRPORTCODE) REFERENCES AIRPORT(AIRPORTCODE) ON DELETE SET NULL ON UPDATE CASCADE, CONSTRAINT IC-DTIME-ATIME CHECK DTIME < ATIME); Integrity - general constraints • applies to an arbitrary combination of columns and tables • connecting RESERVATIONS for a customer must make sense: CREATE ASSERTION IC-CONNECTING-FLIGHTS CHECK (NOT EXISTS (SELECT * FROM FLT-SCHEDULE FS1 FS2, RESERVATION R1 R2 WHERE FS1.FLT#=R1.FLT# AND FS2.FLT#=R2.FLT# AND R1.DATE=R2.DATE AND FS1.TO-AIRPORTCODE=FS2.FROM-AIRPORTCODE AND FS1.ATIME+ INTERVAL “30” MINUTE > FS2.DTIME)); Integrity - (not so) general constraints • not all constraints can be specified CREATE TABLE AIRLINE.FLT-WEEKDAY (FLT# FLIGHTNUMBER NOT NULL, WEEKDAY CHAR(2), .... )); CREATE TABLE AIRLINE.FLT-INSTANCE (FLT# FLIGHTNUMBER NOT NULL, DATE DATE NOT NULL, .... )); CREATE ASSERTION DATE-WEEKDAY-CHECK (NOT EXISTS (SELECT * FROM FLT-INSTANCE FI, FLT-WEEKDAY FSD WHERE FI.FLT#=FSD.FLT# AND weekday-of(FI.DATE) <> FSD.WEEKDAY)); • weekday-of: DATE WEEKDAY Transaction Control • atomic, consistent, isolated, durable (ACID) transactions are supported by: – COMMIT and – ROLLBACK EXEC SQL OPEN FLT; WHILE TRUE DO EXEC SQL FETCH FLT INTO :FLT#, :AIRLINE, :PRICE; DO YOUR THING WITH THE DATA; END-WHILE; EXEC SQL CLOSE FLT; QUIT: IF SQLCODE < 0 THEN EXEC SQL ROLLBACK ELSE EXEC SQL COMMIT; Authorization • Discretionary Access Control (DAC) is supported by GRANT and REVOKE: GRANT <privileges> ON <table> TO <users> [WITH GRANT OPTION]; REVOKE [GRANT OPTION FOR] <privileges> ON <table> FROM <users> {RESTRICT | CASCADE}; <privileges>: SELECT, INSERT(X), INSERT, UPDATE(X), UPDATE, DELETE CASCADE: revoke cascades through its subtree RESTRICT: revoke succeeds only if there is no subtree Authorization GRANT INSERT, DELETE ON FLT-SCHEDULE TO U1, U2 WITH GRANT OPTION; GRANT UPDATE(PRICE) ON FLT-SCHEDULE TO U3; REVOKE GRANT OPTION FOR DELETE ON FLT-SCHEDULE FROM U2 CASCADE; REVOKE DELETE ON FLT-SCHEDULE FROM U2 CASCADE; Catalog and Dictionary Facilities • an INFORMATION_SCHEMA contains the following tables (or rather views) for the CURRENT_USER: – INFORMATION-_SCHEMA_CATALOG_NAME: single-row, singlecolumn table with the name of the catalog in which the INFORMATION_SCHEMA resides – SCHEMATA created by CURRENT_USER – DOMAINS accessible to CURRENT_USER – TABLES accessible to CURRENT_USER – VIEWS accessible to CURRENT_USER – COLUMNS of tables accessible to CURRENT_USER – TABLE_PRIVILEGES granted by or to CURRENT_USER – COLUMN_PRIVILEGES granted by or to CURRENT_USER – USAGE_PRIVILEGES granted by or to CURRENT_USER – DOMAIN_CONSTRAINTS – TABLE_CONSTRAINTS – REFERENTIAL_CONSTRAINTS – CHECK_CONSTRAINTS – and 18 others ... Structure of a DBMS • A typical DBMS has a layered architecture. • The figure does not show the concurrency control and recovery components. • Each system has its own variations. • The book shows a somewhat more detailed version. • You will see the “real deal” in PostgreSQL. – It’s a pretty full-featured example • Next class: we will start on this stack, bottom up. These layers must consider concurrency control and recovery Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB TRANSACTION CONTROL Transaction control • • • • • • • • Transaction Concept Concurrent Execution Conflict Serializability Locking SQL 92 Consistency Levels Oracle multi version concurrency control Transaction Log Crash Recovery Transaction concept • A sequence of SQL statements that form a logical unit of work. • Changes the database from one logically consistent state to another. E.g delete order – DB consistent • – DB inconsistent • – delete from order_header where ono = 123; RI failure now delete from order_line where ono = 123; DB consistent No unmatched FK’s Transaction properties • Atomicity – All SQL complete or none are completed. • Consistency – Programmer responsibility. • Isolation Multiple transactions execute in isolation. • Durability – Successful commits survive system failures. – Transaction State Diagram Partially Committed Committed Failed Rollback Active User Action System Action Transaction states • Active - state whilst executing • Partially Committed – SQL commit statement execution started. • Failed – Normal execution cannot proceed • Committed • Rollback undone - changes guaranteed to stay - changes guaranteed SQL Transaction statements • The first SQL statement starts a transaction that is terminated by – commit; • makes all changes permanent – rollback; • removes all changes – A DDL command – A system generated rollback – Quitting the session Commit at regular intervals • This will keep your transactions short – avoids wasting system resources – prevents loss of work if system generated rollback occurs – allows use of rollback to undo user mistakes Benefits from concurrent execution of transactions • Interleaving of CPU and I/O – Allows different transactions to execute in parallel. – Increases system utilisation and throughput. • Allows long and short running transactions to make progress if not accessing the same data. – Alternative is to serialise and then the short transactions have long waits. Database Reads and Writes • Each transaction can be regarded as a sequence of database reads and writes. • Concurrent transactions only conflict when accessing the same data. • Local variables are in separate address space so no conflict. • Computation on local variables only explains how the write value is formed. Lost Update ( SQL version ) U1 Time select val = current from grant; U2 t1 val = val + 500; t2 select val = current from grant; val = val * 1.1; update grant set current = val; t3 t4 update grant set current = val; Database Table grant(current) @t1 2000 @t3 2500 @t4 2200 Lost Update ( Read/Write) U1 Read(current) Time U2 t1 val = val + 500; t2 Read(current) val = val * 1.1; Write(current) t3 t4 Write(current) Database Table grant(current) @t1 2000 @t3 2500 @t4 2200 Lost Update Serial Schedule 1 U1 Read(current) Time U2 t1 val = val + 500; Write(current) t2 t3 Read(current) val = val * 1.1; t4 Write(current) Database Table grant(current) @t1 2000 @t2 2500 @t4 2750 Lost Update Serial Schedule 2 U1 Time Database Table grant(current) U2 Definitions • A Schedule is a sequence of reads/writes of a set of concurrent users that preserves the order of the reads/writes of the individual users. E.g. Lost Update(Read/Write) slide • A serial schedule is a schedule where the reads/writes of each users are executed consecutively without any interleaving of different users reads/writes E.g. Lost Update Serial Schedule 1 slide Database schedules U1 U2 read A U1 U2 read A read A U1 U2 write A write A U1 U2 write A read A write A Consider the effect of reversing the interleaved sequence U1 U2. Only the two read A may be interchanged without effect - rest are conflicting actions. I.e. the new sequence changes the value read or the value after the write. Read A Write B are not conflicting as on different values. Conflict Serialisability • Schedule S is conflict serialisable if it is conflict equivalent to a serial schedule. • Two schedules are conflict equivalent if one can be transformed into the other by a series of swaps of non conflicting actions. Is the Lost Update schedule Conflict Serialisable ? Conflict Schedule U1 Serial Schedule U2 U1 read A read A read A write A ? write A read A write A . U2 write A A Conflict Serialisable schedule? U1 U2 U1 read A read A write A write A read A write A ? U2 read B write B read B read A write B write A read B read B write B write B Conflict Serialisable schedules via locking • Each transaction must obtain a lock before accessing a data value. – Shared lock granted allows read access. – eXclusive lock granted allows write access. • Locks are requested from and granted by the Lock Manager process. • Ensures that any interleaving of concurrent transactions produces a conflict serialisable schedule. Lock Compatibility Matrix The lock manager grants lock requests by reference to: Lock already granted on data A. Shared eXclusive Lock requested Shared Yes No on data A eXclusive No No If table entry Yes then lock types are compatible and requested lock is granted. If table entry No then requesting transaction is placed into a wait state until the lock holding transaction(s) have completed and returned the lock. Rigorous Two Phase Locking • Growing Phase = Active state. Locks may be requested but not dropped. • Shrinking Phase = Partially committed or failed – Locks are dropped but not requested. – • Thus all locks held until end of transaction signalled by rollback or commit Lost Update Locking solution T1 Lock Manager T2 request S (A) granted read A request S (A) granted read A request X (A) Waiting for lock request X (A) Waiting for lock Lost Update Locking solution (continued from previous slide) T1 Lock Manager T2 request X (A) request X (A) Deadlock - System aborted Drops S (A) lock write A granted Restarted by programmer. Transactions have been serialised. commit locks dropped Summary of Lock Requests T1 T2 S(A) S(A) W(A) W(A) Wait for grant of lock held by T pointed to Why use W locks on reads for values about to be updated ? T1 T2 Wait for grant of lock held by T pointed to Why use W locks on reads for values about to be updated ? T1 T2 W(A) W(A) waiting for W lock continues avoids deadlocking Wait for grant of lock held by T pointed to Effect of locking • Lock manager prevents writers and readers accessing the same data item concurrently. • Thus a schedule containing conflicting actions cannot occur. • Thus all allowed schedules are conflict serialisable. Lock starvation • Writers may not make progress when many Readers are accessing the same data item. • Prevent by not granting further Shared lock requests when an eXclusive request is waiting. • Then process waiting requests in FIFO order. Readers Wait for Writers T1 T2 T3 T4 read A read A write A start wait read A start wait commit commit end wait commit end wait Time Deadlock Detection • Lock Manager builds wait-for graph for all the transactions in process. • Deadlock exists if the graph contains a cycle. Detection algorithm run frequently. • Lost Update Wait-for graph showing cycle. T1 T2 Effect of locking on database performance • Readers and Writers concurrently accessing the same data item under Two Phase locking will be forced to serialise I.e wait until exclusive access is possible. • Users will experience delayed responses. • To improve performance reduced levels of locking (or consistency of final results) are available. Use with caution. SQL 92 Consistency Levels - From highest to lowest level • Serializable i.e Two Phase Locking • Repeatable Read. – phantom reads possible • Read Committed – read locks dropped after data has been read. – Non repeatable & phantom reads possible. • Read Uncommitted (Lowest Level) – reads data that still have write locks in place. – dirty read possible Transaction interactions between T1 and T2 • dirty read – T1 can read uncommitted data from T2 • non repeatable read – T1 re-reads data committed by T2 and now sees the new data value • phantom read – T1 re-executes a query and discovers new data inserted or changed by committed T2 Non Repeatable Read emp id dept_id dept dept_id name d1 T1 Sales T2 insert into emp select e1, dept_id from dept where name = ‘Sales’ update dept set dept_id = ‘d2’ where dept_id = ‘d1’;commit; insert into emp select e2, dept_id from dept where name = ‘Sales’; commit; Non Repeatable Read - locking at Read Committed level T1 acquire locks X(emp) S(dept) insert ……... drop lock S(dept) * acquire lock S(dept) insert …...; commit; drop locks X(emp) S(dept) T2 acquire lock X(dept) update ……..; commit; drop lock X(dept) Is this problem possible under Rigorous Two Phase Locking? T1 T2 Wait for grant of lock held by T pointed to Inconsistent Analysis / Phantom Read T1 Stock pno qty T2 select sum(qty) from stock; update stock set qty = 2 1 2 where pno = 2; 2 4 3 1 1 update stock set qty = 3 4 5 5 where pno = 5; 5 1 commit; 2 2 3 Actual 13 4 3 Reported 15 Inconsistent Analysis / Phantom Read locking analysis • T2 For each row acquire lock S(stock) , read value, drop lock S(stock) • T1 Allows lock X(stock) to be acquired on previously read row Completes updates. Drops X(stock) lock. • T2 acquires S(stock) and so reads both old and new values causing the problem. Is this problem possible under Rigorous Two Phase Locking? T1 T2 Wait for grant of lock held by T pointed to Uncommitted / Dirty Read account acc_id 1 balance -2000 T1 T2 update account set balance = 10000 where acc_id = 1; select balance from account where acc_id = 1; passes credit check rollback; Uncommitted / Dirty Read locking analysis • Locking at Read Uncommitted Level which effectively means Lock Compatability Matrix is changed. • Now Shared and eXclusive locks changes from No to Yes. • Note Concurrent Writes are still not allowed. Is this problem possible under Rigorous Two Phase Locking? T1 T2 Wait for grant of lock held by T pointed to Lock granularity • Table locking provides least concurrency • Disk Page locking – Most common level. The number of rows locked depends on row size/page size ratio. • Row locking provides maximum concurrency – Commonly used for on line transactions – Highest system cost as requires most locks Transaction Locking and Chrash recovery Transaction Log - Immediate Update of row data on Disk Transaction Write Ahead Log on Disk T0 start update stock set qty = 100 where p# = 1; T0, 2.0, qty, 200, 100 delete from stock where p# = 2; T0, 2.1, p#, 2,, qty, 100, insert into stock values( 3, 600); T0, 2.2, p#,, 3, qty,, 600 commit; T0 commit Disk Page 2 offset 0 1 100 200 holds stock values offset 1 2 100 offset 2 3 600 Transaction Log sequence • Write Ahead Protocol means BI,AI and other log entries are written before the data row on disk is changed. • If transaction ends with a commit no further action is required. • If ended with a rollback then system must apply BI to row data to restore the original values on disk. Transaction Log entries T0 T1 Database Values read a read c a b c a = a - 50 c = c - 100 1000 2000 700 write a write c read b b = b + 50 write b Write out log assuming execute in order T0 T1 and show Database Values assuming Immediate Update of database values. Transaction log entries Log Database Values a b c 1000 2000 700 Transaction log entries Log Database Values T0 start a b c 950 2000 700 T0 a 1000 950 T0 b 2000 2050 1000 2050 T0 commit T1 start T1 c 700 600 T1 commit 600 Immediate Update of row data in memory update stock set qty = qty * 2 Row Data Buffer Page Memory Log Buffer Page p# qty T1, 9.1, qty,10,20 1 10 20 T1, 9.2, qty, 40, 80 2 40 80 Stock Database Page flush on before row Disk commit/page full data buffer flushed T1, 9.1, qty,10,20 1 10 T1, 9.2, qty, 40, 80 2 40 Log File Page Log Based Recovery • System crash loses memory buffer pages. • Committed changes may not be on disk. – Row data Buffer pages not yet flushed out. • Non committed changes may be on disk. – Row data Buffer pages already flushed out. • WAL ensures that Log pages are already on disk ( a different disk from the table data!) before row data buffer pages flushed to disk Recovery Processing • Read log forwards from the start putting all transactions found onto one of the lists: – – Undo transactions with no commit terminator Redo transaction with a commit terminator • Go backwards from crash point ‘undoing’ – Using Before Images • Go forwards from start ‘redoing’ – Using After Images Database recovery Using the data from the Transaction Log slide show the recovered database values. Assume failure occurred i after write b ii after write c Database recovery after write b Log Database Values T0 start a T0 a 1000 950 1000 950 b c 2000 700 T0 b 2000 2050 2050 recovery starts by reading from start Undo list Redo list Database recovery after write b Log Database Values T0 start a b c 950 2000 700 T0 a 1000 950 1000 T0 b 2000 2050 2050 recovery starts by reading from start Undo list Redo list T0 Undo T0 b 2000 2050 by applying BI Undo T0 a 1000 950 Database values as at start 950 1000 No entries as no ends 2050 2000 Database recovery after write c recovery starts by reading from start Undo list Redo list T0 start T0 a 1000 950 T0 b 2000 2050 Database Values T0 commit a b c T1 start 950 2050 600 T1 c 700 600 Database recovery after write c recovery starts by reading from start T Undo list T Redo list T0 start T0 T0 a 1000 950 T1 T0 b 2000 2050 Database Values T0 end a b c T1 start ? ? ? T1 c 700 600 redo 950 redo 2050 undo 700 Why undo backwards & redo forwards through log? Log BI AI Database Values T0 a 10 20 a T0 a 20 10 10 20 10 if undo backwards using BI ? 20 10 if redo forwards using AI ? 20 10 Checkpoint • Flushes all log buffer pages to disk. • Flushes all modified data buffer pages to disk. • Writes a log record with list of open transactions. • Recovery takes place from the latest checkpoint record in the log. – Reduces recovery time Disk Crash Recovery • Assumes database dumps are taken periodically. – Flushes all buffers; copies database files; writes out a dump record. • Load latest database dump onto disk. • Using log started immediately after the dump which has been loaded rollforward ‘redoing’ all committed transactions. The End