* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 03/22/13
Survey
Document related concepts
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Ingres (database) wikipedia , lookup
Commitment ordering wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational algebra wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Database model wikipedia , lookup
Serializability wikipedia , lookup
Transcript
+ Constraints and Transactions and ACID CSCI 2141 – W2013 Slides from Ullman (A First Course in Database Systems: http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/cons.ppt http://infolab.stanford.edu/~ullman/fcdb/aut07/slides/trans-view-index.ppt www.cs.sunysb.edu/~cse515/Fall07/slides/ch21.ppt 1 + 2 Housekeeping Missed 2 classes this week so are on a sprint to the finish Today will focus on Transactions and ACID, Included in today’s slide deck are details on constraints, triggers, and transaction implementation I will cover the high level and leave it to you to learn the implementation (db dependent) Assignment 5 is out and due on March 27th (Wed) Practice more complex SQL and implementation of checks/constraints Quiz on March 27th – same as assignment + ACID + concept of transactions Monday 25th: noSQL April 1-3: Query optimization April 5: DB security April 8: Quiz (query optimization) + general review + 3 Constraints and Triggers A constraint is a relationship among data elements that the DBMS is required to enforce. Example: key constraints. Triggers are only executed when a specified condition occurs, e.g., insertion of a tuple. Easier to implement than complex constraints. + 4 Kinds of Constraints Keys. Foreign-key, or referential-integrity. Value-based constraints. Tuple-based constraints. Constrain values of a particular attribute. Relationship among components. Assertions: any SQL boolean expression. + Single-Attribute Keys Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute. Example: CREATE TABLE Product ( ); name CHAR(20) UNIQUE, manf CHAR(20) 5 + 6 Multi-attribute Key The canteen and product together are the key for Sells: CREATE TABLE Sells ( canteen CHAR(20), product VARCHAR(20), priceREAL, PRIMARY KEY (canteen, product) ); + 7 Foreign Keys Values appearing in attributes of one relation must appear together in certain attributes of another relation. Example: in Sells(canteen, product, price), we might expect that a product value also appears in Product.name . + 8 Expressing Foreign Keys Use keyword REFERENCES, either: 1. After an attribute (for one-attribute keys). 2. As an element of the schema: FOREIGN KEY (<list of attributes>) REFERENCES <relation> (<attributes>) Referenced attributes must be declared PRIMARY KEY or UNIQUE. + 9 Example: With Attribute CREATE TABLE products ( name CHAR(20) PRIMARY KEY, manf CHAR(20) ); CREATE TABLE Sells ( canteen CHAR(20), product CHAR(20) REFERENCES products(name), price REAL ); + Example: As Schema Element CREATE TABLE products ( name CHAR(20) PRIMARY KEY, manf CHAR(20) ); CREATE TABLE Sells ( canteen CHAR(20), product CHAR(20), price REAL, FOREIGN KEY(product) REFERENCES products(name)); 10 + Enforcing Foreign-Key Constraints If there is a foreign-key constraint from relation R to relation S, two violations are possible: 1. An insert or update to R introduces values not found in S. 2. A deletion or update to S causes some tuples of R to “dangle.” 11 + 12 How should we handle this? Example: suppose R = Sells, S = products. An insert or update to Sells that introduces a nonexistent product must be rejected. A deletion or update to products that removes a product value found in some tuples of Sells can be handled in three ways (next slide). + 13 3 possible actions 1. Default : Reject the modification. 2. Cascade : Make the same changes in Sells. 3. Deleted product: delete Sells tuple. Updated product: change value in Sells. Set NULL : Change the product to NULL. + 14 Example: Cascade Delete the Coke tuple from products: Then delete all tuples from Sells that have product = ’Coke’. Update the Coke tuple by changing ’Coke’ to ’CokeClassic’: Then change all Sells tuples with product = ’Coke’ to product = ’CokeClassic’. + 15 Example: Set NULL Delete the Coke tuple from products: Change all tuples of Sells that have product = ’Coke’ to have product = NULL. Update the Coke tuple by changing ’Coke’ to ’CokeClassic’: Same change as for deletion. + 16 Choosing a Policy When we declare a foreign key, we may choose policies SET NULL or CASCADE independently for deletions and updates. Follow the foreign-key declaration by: ON [UPDATE, DELETE][SET NULL CASCADE] Two such clauses may be used. Otherwise, the default (reject) is used. + Example: Setting Policy CREATE TABLE Sells ( canteen CHAR(20), product CHAR(20), price REAL, FOREIGN KEY(product) REFERENCES products(name) ON DELETE SET NULL ON UPDATE CASCADE ); 17 + 18 Attribute-Based Checks Constraints on the value of a particular attribute. Add CHECK(<condition>) to the declaration for the attribute. The condition may use the name of the attribute, but any other relation or attribute name must be in a subquery. + Example: Attribute-Based Check CREATE TABLE Sells ( canteen CHAR(20), product CHAR(20) CHECK ( product IN (SELECT name FROM products)), price ); REAL CHECK ( price <= 5.00 ) 19 + 20 Timing of Checks Attribute-based checks are performed only when a value for that attribute is inserted or updated. Example: CHECK (price <= 5.00) checks every new price and rejects the modification (for that tuple) if the price is more than $5. Example: CHECK (product IN (SELECT name FROM products)) not checked if a product is deleted from products (unlike foreign-keys). + 21 Tuple-Based Checks CHECK (<condition>) may be added as a relation-schema element. The condition may refer to any attribute of the relation. But other attributes or relations require a subquery. Checked on insert or update only. + 22 Example: Tuple-Based Check Only Joe’s canteen can sell product for more than $5: CREATE TABLE Sells ( canteen CHAR(20), product CHAR(20), priceREAL, CHECK (canteen = ‘Joes canteen’ OR price <= 5.00) ); + 23 Assertions These are database-schema elements, like relations or views. Defined by: CREATE ASSERTION <name> CHECK (<condition>); Condition may refer to any relation or attribute in the database schema. + 24 Example: Assertion In Sells(canteen, product, price), no canteen may charge an average of more than $5. CREATE ASSERTION NoRipoffcanteens CHECK ( NOT EXISTS ( SELECT canteen FROM Sells GROUP BY canteen HAVING 5.00 < AVG(price) )); canteens with a average price above $5 + 25 Example: Assertion In Customers(name, addr, phone) and canteens(name, addr, license), there cannot be more canteens than Customers. CREATE ASSERTION FewCanteen CHECK ( (SELECT COUNT(*) FROM canteens) <= (SELECT COUNT(*) FROM Customers) ); + 26 Timing of Assertion Checks In principle, we must check every assertion after every modification to any relation of the database. A clever system can observe that only certain changes could cause a given assertion to be violated. Example: No change to products can affect FewCanteen. Neither can an insertion to Customers. + 27 Triggers: Motivation Assertions are powerful, but the DBMS often can’t tell when they need to be checked. Attribute- and tuple-based checks are checked at known times, but are not powerful. Triggers let the user decide when to check for any condition. + 28 Event-Condition-Action Rules Another name for “trigger” is ECA rule, or event-conditionaction rule. Event : typically a type of database modification, e.g., “insert on Sells.” Condition : Any SQL boolean-valued expression. Action : Any SQL statements. + Preliminary Example: A Trigger Instead of using a foreign-key constraint and rejecting insertions into Sells(canteen, product, price) with unknown products, a trigger can add that product to products, with a NULL manufacturer. 29 + 30 Example: Trigger Definition CREATE TRIGGER productTrig The event AFTER INSERT ON Sells REFERENCING NEW ROW AS NewTuple FOR EACH ROW WHEN (NewTuple.product NOT IN The condition (SELECT name FROM products)) INSERT INTO products(name) VALUES(NewTuple.product); The action + 31 Options: CREATE TRIGGER CREATE TRIGGER <name> Or: CREATE OR REPLACE TRIGGER <name> Useful if there is a trigger with that name and you want to modify the trigger. + 32 Options: The Event AFTER can be BEFORE. Also, INSTEAD OF, if the relation is a view. A clever way to execute view modifications: have triggers translate them to appropriate modifications on the base tables. INSERT can be DELETE or UPDATE. And UPDATE can be UPDATE . . . ON a particular attribute. + 33 Options: FOR EACH ROW Triggers are either “row-level” or “statement-level.” FOR EACH ROW indicates row-level; its absence indicates statement-level. Row level triggers : execute once for each modified tuple. Statement-level triggers : execute once for a SQL statement, regardless of how many tuples are modified. + 34 Options: REFERENCING INSERT statements imply a new tuple (for row-level) or new table (for statement-level). The “table” is the set of inserted tuples. DELETE implies an old tuple or table. UPDATE implies both. Refer to these by [NEW OLD][TUPLE TABLE] AS <name> + 35 Options: The Condition Any boolean-valued condition. Evaluated on the database as it would exist before or after the triggering event, depending on whether BEFORE or AFTER is used. But always before the changes take effect. Access the new/old tuple/table through the names in the REFERENCING clause. + 36 Options: The Action There can be more than one SQL statement in the action. Surround by BEGIN . . . END if there is more than one. But queries make no sense in an action, so we are really limited to modifications. + 37 Another Example Using Sells(canteen, product, price) and a unary relation Ripoffcanteens(canteen), maintain a list of canteens that raise the price of any product by more than $1. + 38 The Trigger CREATE TRIGGER PriceTrig AFTER UPDATE OF price ON Sells The event – only changes to prices REFERENCING Updates let us OLD ROW AS ooo talk about old and new tuples NEW ROW AS nnn We need to consider FOR EACH ROW each price change WHEN(nnn.price > ooo.price + 1.00) INSERT INTO Ripoffcanteens VALUES(nnn.canteen); Condition: a raise in price > $1 When the price change is great enough, add the canteen to Ripoffcanteen + Transactions, Views, Indexes Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data 39 + 40 Why Transactions? Database systems are normally being accessed by many users or processes at the same time. Both queries and modifications. Unlike operating systems, which support interaction of processes, a DMBS needs to keep processes from troublesome interactions. + 41 Example: Bad Interaction Two people with a joint back account each take $100 from different ATMs at about the same time. The DBMS better make sure one account deduction doesn’t get lost. Compare: An OS allows two people to edit a document at the same time. If both write, one’s changes get lost. + 42 Transactions Transaction = process involving database queries and/or modification. Normally with some strong properties regarding concurrency. Formed in SQL from single statements or explicit programmer control. + Transactions 43 The execution of each transaction must maintain the relationship between the database state and the enterprise state Therefore additional requirements are placed on the execution of transactions beyond those placed on ordinary programs: Atomicity Consistency Isolation Durability ACID properties + 44 Database Consistency Enterprise (Business) Rules limit the occurrence of certain real-world events Student cannot register for a course if the current number of registrants equals the maximum allowed Correspondingly, allowable database states are restricted cur_reg <= max_reg These limitations are called (static) integrity constraints: assertions that must be satisfied by all database states (state invariants). + 45 Database Consistency (state invariants) Other static consistency requirements are related to the fact that the database might store the same information in different ways cur_reg = |list_of_registered_students| Such limitations are also expressed as integrity constraints Database is consistent if all static integrity constraints are satisfied + Transaction Consistency 46 A consistent database state does not necessarily model the actual state of the enterprise A deposit transaction that increments the balance by the wrong amount maintains the integrity constraint balance 0, but does not maintain the relation between the enterprise and database states A consistent transaction maintains database consistency and the correspondence between the database state and the enterprise state (implements its specification) Specification of deposit transaction includes balance = balance + amt_deposit , (balance is the next value of balance) + 47 Dynamic Integrity Constraints (transition invariants) Some constraints restrict allowable state transitions A transaction might transform the database from one consistent state to another, but the transition might not be permissible Example: A letter grade in a course (A, B, C, D, F) cannot be changed to an incomplete (I) Dynamic constraints cannot be checked by examining the database state + 48 Transaction Consistency Consistent transaction: if DB is in consistent state initially, when the transaction completes: All static integrity constraints are satisfied (but constraints might be violated in intermediate states) New No state satisfies specifications of transaction dynamic constraints have been violated + Checking Integrity Constraints Automatic: Embed constraint in schema. CHECK, ASSERTION for static constraints TRIGGER for dynamic constraints Increases confidence in correctness and decreases maintenance costs Not always desirable since unnecessary checking (overhead) might result Deposit transaction modifies balance but cannot violate constraint balance 0 Manual: Perform check in application code. Only necessary checks are performed Scatters references to constraint throughout application Difficult to maintain as transactions are modified/added 49 + 50 Atomicity A real-world event either happens or does not happen Student either registers or does not register Similarly, the system must ensure that either the corresponding transaction runs to completion or, if not, it has no effect at all Not true of ordinary programs. A crash could leave files partially updated on recovery + 51 Commit and Abort If the transaction successfully completes it is said to commit The system is responsible for ensuring that all changes to the database have been saved If the transaction does not successfully complete, it is said to abort The system is responsible for undoing, or rolling back, all changes the transaction has made + 52 Reasons for Abort System crash Transaction Execution aborted by system cannot be made atomic (a site is down) Execution did not maintain database consistency (integrity constraint is violated) Execution was not isolated Resources not available (deadlock) Transaction requests to roll back + 53 API for Transactions DBMS and Transaction Processing (TP) monitor provide commands for setting transaction boundaries. Example: begin transaction commit rollback The commit command is a request The system might commit the transaction, or it might abort it for one of the reasons on the previous slide The rollback command is always satisfied + 54 Durability The system must ensure that once a transaction commits, its effect on the database state is not lost in spite of subsequent failures Not true of ordinary programs. A media failure after a program successfully terminates could cause the file system to be restored to a state that preceded the program’s execution + Implementing Durability Database stored redundantly on mass storage devices to protect against media failure Architecture of mass storage devices affects type of media failures that can be tolerated Related to Availability: extent to which a (possibly distributed) system can provide service despite failure Non-stop DBMS (mirrored disks) Recovery based DBMS (log) 55 + Isolation Serial Execution: transactions execute in sequence Each one starts after the previous one completes. Execution of one transaction is not affected by the operations of another since they do not overlap in time The execution of each transaction is isolated from all others. If the initial database state and all transactions are consistent, then the final database state will be consistent and will accurately reflect the real-world state, but Serial execution is inadequate from a performance perspective 56 + 57 Isolation Concurrent execution offers performance benefits: A computer system has multiple resources capable of executing independently (e.g., cpu’s, I/O devices), but A transaction typically uses only one resource at a time Hence, only concurrently executing transactions can make effective use of the system Concurrently executing transactions yield interleaved schedules + begin trans .. op1,1 .. op1,2 .. commit Concurrent Execution T1 op1,1 op1.2 sequence of db operations output by T1 local computation op1,1 op2,1 op2.2 op1.2 T2 op2,1 op2.2 DBMS interleaved sequence of db operations input to DBMS local variables 58 + Isolation 59 Interleaved execution of a set of consistent transactions offers performance benefits, but might not be correct Example: course registration; cur_reg is number local computation of current registrants not seen by DBMS T1: r(cur_reg : 29)…………w(cur_reg : 30) commit T2: r(cur_reg : 29)……………..…w(cur_reg : 30) commit time Result: Database state no longer corresponds to real-world state, integrity constraint violated cur_reg <> |list_of_registered_students| + Interaction of Atomicity and Isolation T1: r(bal:10) w(bal:1000010) abort T2: r(bal:1000010) w(yes!!!) commit time T1 deposits $1000000 T2 grants credit and commits before T1 completes T1 aborts and rolls balance back to $10 T1 has had an effect even though it aborted! 60 + 61 Isolation An interleaved schedule of transactions is isolated if its effect is the same as if the transactions had executed serially in some order (serializable) T1: r(x) w(x) T2: r(y) w(y) It follows that serializable schedules are always correct (for any application) Serializable is better than serial from a performance point of view DBMS uses locking to ensure that concurrent schedules are serializable + 62 Isolation in the Real World SQL supports SERIALIZABLE isolation level, which guarantees serializability and hence correctness for all applications Performance of applications running at SERIALIZABLE is often not adequate SQL also supports weaker levels of isolation with better performance characteristics But beware! -- a particular application might not run correctly at a weaker level + 63 Summary Application programmer is responsible for creating consistent transactions and choosing appropriate isolation level The system is responsible for creating the abstractions of atomicity, durability, and isolation Greatly simplifies programmer’s task since she does not have to be concerned with failures or concurrency + Up next: Slides from FCDB More examples and details of transactions Also, views and indexes 64 + ACID Transactions ACID transactions are: Atomic : Whole transaction or none is done. Consistent : Database constraints preserved. Isolated : It appears to the user as if only one process executes at a time. Durable : Effects of a process survive a crash. Optional: weaker forms of transactions are often supported as well. 65 + 66 COMMIT The SQL statement COMMIT causes a transaction to complete. The database modifications are now permanent in the database. + 67 ROLLBACK The SQL statement ROLLBACK also causes the transaction to end, but by aborting. No effects on the database. Failures like division by 0 or a constraint violation can also cause rollback, even if the programmer does not request it. + Example: Interacting Processes Assume the usual Sells(canteen,product,price) relation, and suppose that Joe’s canteen sells only Coke for $2.50 and OJ for $3.00. Sally is querying Sells for the highest and lowest price Joe charges. Joe decides to stop selling Coke and OJ, but to sell only Chocolate Milk at $3.50. 68 + 69 Sally’s Program Sally executes the following two SQL statements called (min) and (max) to help us remember what they do. (max) SELECT MAX(price) FROM Sells WHERE canteen = ‘Joes canteen’; (min) SELECT MIN(price) FROM Sells WHERE canteen = ‘Joes canteen’; + 70 Joe’s Program At about the same time, Joe executes the following steps: (del) and (ins). (del) DELETE FROM Sells WHERE canteen = ’Joes canteen’; (ins) INSERT INTO Sells VALUES(’Joes canteen’, ’ChocolateMilk’, 3.50); + 71 Interleaving of Statements Although (max) must come before (min), and (del) must come before (ins), there are no other constraints on the order of these statements, unless we group Sally’s and/or Joe’s statements into transactions. + 72 Example: Strange Interleaving Suppose the steps execute in the order (max)(del)(ins)(min). Joe’s Prices: Statement: {2.50,3.00} {2.50,3.00} Result: (max) 3.00 Sally sees MAX < MIN! (del) {3.50} (ins) (min) 3.50 + Fixing the Problem With Transactions If we group Sally’s statements (max)(min) into one transaction, then she cannot see this inconsistency. She sees Joe’s prices at some fixed time. Either before or after he changes prices, or in the middle, but the MAX and MIN are computed from the same prices. 73 + 74 Another Problem: Rollback Suppose Joe executes (del)(ins), not as a transaction, but after executing these statements, thinks better of it and issues a ROLLBACK statement. If Sally executes her statements after (ins) but before the rollback, she sees a value, 3.50, that never existed in the database. + 75 Solution If Joe executes (del)(ins) as a transaction, its effect cannot be seen by others until the transaction executes COMMIT. If the transaction executes ROLLBACK instead, then its effects can never be seen. + 76 Isolation Levels SQL defines four isolation levels = choices about what interactions are allowed by transactions that execute at about the same time. Only one level (“serializable”) = ACID transactions. Each DBMS implements transactions in its own way. + 77 Choosing the Isolation Level Within a transaction, we can say: SET TRANSACTION ISOLATION LEVEL X where X = 1. SERIALIZABLE 2. REPEATABLE READ 3. READ COMMITTED 4. READ UNCOMMITTED + 78 Serializable Transactions If Sally = (max)(min) and Joe = (del)(ins) are each transactions, and Sally runs with isolation level SERIALIZABLE, then she will see the database either before or after Joe runs, but not in the middle. + Isolation Level Is Personal Choice Your choice, e.g., run serializable, affects only how you see the database, not how others see it. Example: If Joe Runs serializable, but Sally doesn’t, then Sally might see no prices for Joe’s canteen. i.e., it looks to Sally as if she ran in the middle of Joe’s transaction. 79 + 80 Read-Commited Transactions If Sally runs with isolation level READ COMMITTED, then she can see only committed data, but not necessarily the same data each time. Example: Under READ COMMITTED, the interleaving (max)(del)(ins)(min) is allowed, as long as Joe commits. Sally sees MAX < MIN. + 81 Repeatable-Read Transactions Requirement is like read-committed, plus: if data is read again, then everything seen the first time will be seen the second time. But the second and subsequent reads may see more tuples as well. + 82 Example: Repeatable Read Suppose Sally runs under REPEATABLE READ, and the order of execution is (max)(del)(ins)(min). (max) sees prices 2.50 and 3.00. (min) can see 3.50, but must also see 2.50 and 3.00, because they were seen on the earlier read by (max). + 83 Read Uncommitted A transaction running under READ UNCOMMITTED can see data in the database, even if it was written by a transaction that has not committed (and may never). Example: If Sally runs under READ UNCOMMITTED, she could see a price 3.50 even if Joe later aborts. + Views A view is a relation defined in terms of stored tables (called base tables ) and other views. Two kinds: 1. Virtual = not stored in the database; just a query for constructing the relation. 2. Materialized = actually constructed and stored. 84 + Declaring Views Declare by: CREATE [MATERIALIZED] VIEW <name> AS <query>; Default is virtual. 85 + 86 Example: View Definition CanDrink(drinker, product) is a view “containing” the drinker-product pairs such that the drinker frequents at least one canteen that serves the product: CREATE VIEW CanDrink AS SELECT drinker, product FROM Frequents, Sells WHERE Frequents.canteen = Sells.canteen; + 87 Example: Accessing a View Query a view as if it were a base table. Also: a limited ability to modify views if it makes sense as a modification of one underlying base table. Example query: SELECT product FROM CanDrink WHERE drinker = ’Sally’; + 88 Triggers on Views Generally, it is impossible to modify a virtual view, because it doesn’t exist. But an INSTEAD OF trigger lets us interpret view modifications in a way that makes sense. Example: View Synergy has (drinker, product, canteen) triples such that the canteen serves the product, the drinker frequents the canteen and likes the product. + 89 Example: The View CREATE VIEW Synergy AS Pick one copy of each attribute SELECT Likes.drinker, Likes.product, Sells.canteen FROM Likes, Sells, Frequents WHERE Likes.drinker = Frequents.drinker AND Likes.product = Sells.product AND Sells.canteen = Frequents.canteen; Natural join of Likes, Sells, and Frequents + 90 Interpreting a View Insertion We cannot insert into Synergy --- it is a virtual view. But we can use an INSTEAD OF trigger to turn a (drinker, product, canteen) triple into three insertions of projected pairs, one for each of Likes, Sells, and Frequents. Sells.price will have to be NULL. + 91 The Trigger CREATE TRIGGER ViewTrig INSTEAD OF INSERT ON Synergy REFERENCING NEW ROW AS n FOR EACH ROW BEGIN INSERT INTO LIKES VALUES(n.drinker, n.product); INSERT INTO SELLS(canteen, product) VALUES(n.canteen, n.product); INSERT INTO FREQUENTS VALUES(n.drinker, n.canteen); END; + 92 Materialized Views Problem: each time a base table changes, the materialized view may change. Cannot afford to recompute the view with each change. Solution: Periodic reconstruction of the materialized view, which is otherwise “out of date.” + Example: Class Mailing List The class mailing list all-2141 could be a materialized view of the class enrollment in Banner. Could be* updated at periodic intervals Possible enroll in the class and miss an email sent out after you enroll but before the materialized view is updated 93 + 94 Example: A Data Warehouse CanteensRus stores every sale at every canteen in a database. Overnight, the sales for the day are used to update a data warehouse = materialized views of the sales. The warehouse is used by analysts to predict trends and move goods to where they are selling best. + 95 Typical Data Warehouse architecture + 96 Database Index A data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and the use of more storage space. Indices can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records. In a relational database, an index is a copy of one part of a table. Some databases extend the power of indexing by allowing indices to be created on functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper case versions of the last_name field in the index. Another option sometimes supported is the use of "filtered" indices, where index entries are created only for those records that satisfy some conditional expression. + 97 Indexes Index = data structure used to speed access to tuples of a relation, given values of one or more attributes. Could be a hash table, but in a DBMS it is always a balanced search tree with giant nodes (a full disk page) called a B-tree. + 98 Declaring Indexes No standard! Typical syntax: CREATE INDEX productIndex ON products(manf); CREATE INDEX SellIndex ON Sells(canteen, product); + 99 Using Indexes Given a value v, the index takes us to only those tuples that have v in the attribute(s) of the index. Example: use productIndex and SellIndex to find the prices of products manufactured by Pete and sold by Joe. (next slide) + Example of Using Indexes SELECT price FROM products, Sells WHERE manf = ‘Pete’ AND products.name = Sells.product AND canteen = ’Joes canteen’; 1. Use productIndex to get all the products made by Pete. 2. Then use SellIndex to get prices of those products, with canteen = ’Joes canteen’ 100 + 101 Database Tuning A major problem in making a database run fast is deciding which indexes to create. Pro: An index speeds up queries that can use it. Con: An index slows down all modifications on its relation because the index must be modified too. + 102 Example: Tuning Suppose the only things we did with our products database was: 1. Insert new facts into a relation (10%). 2. Find the price of a given product at a given canteen (90%). Then SellInd on Sells(canteen, product) would be wonderful, but productInd on products(manf) would be harmful. + 103 Tuning Advisors A major research thrust. Because hand tuning is so hard. An advisor gets a query load, e.g.: 1. Choose random queries from the history of queries run on the database, or 2. Designer provides a sample workload. + 104 Tuning Advisors The advisor generates candidate indexes and evaluates each on the workload. Feed each sample query to the query optimizer, which assumes only this one index is available. Measure the improvement/degradation in the average running time of the queries.