Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Serializability wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational algebra wikipedia , lookup
REPORT CS 257 Ashish Sharma Class ID 118. SJSU ID: 006497452 Submitted to: Prof: Dr. T.Y. Lin * Changes are marked in Italics and Bold 13.1.1 MEMORY HIERARCHY Data storage capacities varies for different data Cost per byte to store data also varies Device with smallest capacity offer the fastest speed with highest cost per bit MEMORY HIERARCHY DIAGRAM Programs, Main Memory DBMS’s As Visual Memory DBMS Tertiary Storage Disk Main Memory Cache File System 13.1.1 MEMORY HIERARCHY Cache Lowest level of the hierarchy Data items are copies of certain locations of main memory Sometimes, values in cache are changed and corresponding changes to main memory are delayed Machine looks for instructions as well as data for those instructions in the cache Holds limited amount of data No need to update the data in main memory immediately in a single processor computer In multiple processors data is updated immediately to main memory….called as write through MAIN MEMORY Refers to physical memory that is internal to the computer. The word main is used to distinguish it from external mass storage devices such as disk drives. Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory Main memories are random access….one can obtain any byte in the same amount of time SECONDARY STORAGE Used to store data and programs when they are not being processed More permanent than main memory, as data and programs are retained when the power is turned off A personal computer might only require 20,000 bytes of secondary storage E.g. magnetic disks, hard disks TERTIARY STORAGE consists of anywhere from one to several storage drives. It is a comprehensive computer storage system that is usually very slow, so it is usually used to archive data that is not accessed frequently. Holds data volumes in terabytes Used for databases much larger than what can be stored on disk 13.1.2 TRANSFER OF DATA BETWEEN LEVELS Data moves between adjacent levels of the hierarchy At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time Disk is organized into bocks Entire blocks are moved to and from memory called a buffer A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time Same idea applies to other hierarchy levels 13.1.3 VOLATILE AND NON VOLATILE STORAGE A volatile device forgets what data is stored on it after power off Non volatile holds data for longer period even when device is turned off Secondary and tertiary devices are non volatile Main memory is volatile 13.1.4 VIRTUAL MEMORY computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage technique make programming of large applications easier and use real physical memory (e.g. RAM) more efficiently Typical software executes in virtual memory Address space is typically 32 bit or 232 bytes or 4GB Transfer between memory and disk is in terms of blocks 13.2.1 MECHANISM OF DISK Mechanisms of Disks Use of secondary storage is one of the important characteristic of DBMS Consists of 2 moving pieces of a disk 1. disk assembly 2. head assembly Disk assembly consists of 1 or more platters Platters rotate around a central spindle Bits are stored on upper and lower surfaces of platters DOCUMENT RETRIEVAL AND INVERTED INDEX • Due advent in WWW keeping documents online and document retrieval become one of the largest database problem • The most easy approach for document retrieval is to create separate index for each word (Problem: wastage of storage space) • The other approach is to use Inverted Index INVERTED INDEX: Records is a collection of documents • The inverted index itself consist of set of word-pointer pairs • The inverted index pointers refer to position in the bucket file • Pointers in the bucket file can be: • Pointers to document • Pointers to occurrence of word (may be pair of first block of document and an integer indicating number of word) • When points to word we can include some info in bucket array EX. For document using HTML and XML we can also include marking associated with words so we can distinguish between titles headers etc. MORE INFORMATION RETRIEVAL TECHNIQUES TO IMPROVE EFFECTIVENESS 1. 2. Stemming: Remove suffixes to find stem of each word ( Ex. Plurals can be treated as there singular version. Stop Words: words such as “the” or “and”. Are excluded from inverted index Ex. With ref. to next fig. if we want to find the document about the dogs that compare them with cats. • Difficult to solve with out understanding of text • However we could get good hint if we search document that 1. Mention dogs in the title, and 2. Mention cats in an anchor B-Trees The most commonly used index structure in the commercial systems. Advantages • B-trees automatically maintains the levels of index according to file size • B-trees mange the space on the blocks so no overflow blocks are needed The structure of B-trees • The tree is balanced ( All the paths from root to leaf have the same length • Typically three layers: the root, an intermediate layer, and leaves. Important rules about what can appear in the blocks of a B-tree: • Keys are distributed among the leaves in sorted order, from left to right • At root there are at least two used pointer. (exception of tree with single record) • • At leaf last pointer points to the next leaf block to the right. At interior node all n+1 pointer can be used (at least n+1/2 are actually used) EFFICIENCY OF B-TREE • B-tree allow lookup, insertion, deletion of record using few disk I/O’s 1. If the number of keys per block is reasonably large then rarely we need to split or merge the blocks. And even if we need this operation this are limited to the leaves and there parents. Thus, we can neglect the cost of B-tree reorganization. 2. The number of disk I/O to read the record are normally the levels of B-tree plus the one (for lookup) and two (for insert or delete). Ex. Suppose 340 key pointer pairs could fit in one block, suppose avg. block has occupied between min. and max. i.e. the typical block has 255 pointers. • With root 255 children and 255^2 = 65025 leaves • Suppose among this leaves we have 255^3 or about 16.6 million records • That is, files with up to 16.6 million records can be accommodated by 3 levels of B-tree • Number of disk I/O can reduced further by keeping B-tree in main memory. TYPES OF ERRORS Intermittent Error: Read or write is unsuccessful. Media Decay: Bit or bits becomes permanently corrupted. Write Failure: Neither write or retrieve the data. Disk Crash: Entire disk becomes unreadable. INTERMITTENT FAILURES If we try to read the sector but the correct content of that sector is not delivered to the disk controller Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the read operation. The most common form of failure. Parity checks can be used to detect this kind of failure. CHECKSUMS Each sector has some additional bits, called the checksums Checksums are set on the depending on the values of the data bits stored in that sector Probability of reading bad sector is less if we use checksums For Odd parity: Odd number of 1’s, add a parity bit 1 For Even parity: Even number of 1’s, add a parity bit 0 So, number of 1’s becomes always even Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100 STABLE STORAGE To recover the disk failure known as Media Decay, in which if we overwrite a file, the new data is not read correctly Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned RECOVERY FROM DISK CRASHES: WAYS TO RECOVER DATA The most serious mode of failure for disks is “head crash” where data permanently destroyed. So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes. CONTINUE : RECOVERY FROM DISK CRASHES: WAYS TO RECOVER DATA Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk. MIRRORING AS A REDUNDANCY TECHNIQUE Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. In this scheme we mirror each disk. One of the disk is called as data disk and other redundant disk. In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired. How does Mirror work? -- making two or more copied of the data on different disks PARITY BLOCKS RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true. How this one redundant disk works? -- modulo-2 sum; -- the jth bit of the redundant disk is the modulo-2 sum of the jth bits of all the data disks. PARITY BLOCKS – READING DISK Reading data disk is same as reading block from any disk. • We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: 10101010 disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of the bits in each column, we get disk 1: 11110000 PARITY BLOCK - WRITING • When we write a new block of a data disk, we need to change that block of the redundant disk as well. • One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os For a total N data disks: Better way: Take modulo-2 sum of the old and new version of the data block which was rewritten; Change the position of the redundant disk which was 1’s in the modulo-2 sum; PARITY BLOCKS – FAILURE RECOVERY If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010 AN IMPROVEMENT: RAID 5 Principle of RAID level 5 (RAID 5): -- treat each disk as the redundant disk for some of the blocks; • RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. • Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. • However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5. CONTINUE : AN IMPROVEMENT: RAID 5 For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1. • For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4, 8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4. • The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,. . ., and disk 3 is redundant for 3, 7, 11,. . . . • COPING WITH MULTIPLE DISK CRASHES • Error-correcting codes theory known as Hamming code leads to the RAID level 6. • By this strategy the two simultaneous crashes are correctable. The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3. The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4. The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4. RAID 6 – deal with any number of disk crashes if using enough redundant disks • Example a system of seven disks (four data disks_numer 1-4 and 3 redundant disks_ number 5-7); How to set up this 3*7 matrix ? (why is 3? – there are 3 redundant disks) 1)every column values three 1’s and 0’s except for all three 0’s; 2) column of the redundant disk has single 1’s; 3) column of the data disk has at least two 1’s; COPING WITH MULTIPLE DISK CRASHES – READING/WRITING • We may read data from any data disk normally. • To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1. FIXED LENGTH RECORDS Example: CREATE TABLE Moviestar ( name CHAR(30) PRIMARY KEY, address VARCHAR(255) , gender CHAR(1) , birthdate DATE ); RECORDS Records consist of fields. Each record must have a schema which is stored by database system. The schema includes the name and data types of the fields and their offsets within the record. Fixed Length Records: Each record start at a byte within its block that is a multiple of 4. All fields within the record start at a byte that is offset from the beginning of the record by a multiple of 4. RECORD HEADERS Following information should be there in the record. 1. The record schema 2. The length of the record 3. Timestamps many record layouts include a header of some small number of bytes to provide this additional information. RECORDS INTO BLOCKS Records representing tuples of a relation are stored in blocks of the disk and moved into main memory when we need to access or update them. Header contains following information. Links to one or more other blocks that are part of a network of blocks for creating indexes to the tuples of a relation. Information about the role played by this block in such a network. Information about which relation the tuples of this block belong to. Timestamps indicating the time of the block's last modification or access. CLIENT-SERVER SYSTEMS Database consists of a server process that provides data from secondary storage to one or more client processes that are applications using the data. The server and client processes may be on one machine, or the server and the various clients can be distributed over many machines. CLIENT-SERVER SYSTEMS The client application uses a "virtual" address space. The operating system or DBMS decides which parts of the address space are currently located in main memory, and hardware maps the virtual address space to physical locations in main memory. The server's data lives in a database address space. The addresses of this space refer to blocks, and possibly to offsets within the block. CLIENT-SERVER SYSTEMS-PHYSICAL ADDRESS These are byte strings that let us determine the place within the secondary storage system where the block or record can be found. Bytes of physical address used to indicate following information: The host to which the storage is attached. An identifier for the disk or other device on which the block is located. CLIENT-SERVER SYSTEMS-PHYSICAL ADDRESS The number of the cylinder of the disk. The number of the track within the cylinder. The number of the block within the track. The offset of the beginning of the record within the block. LOGICAL AND STRUCTURED ADDRESSES All the information needed for a physical address is found in the map table. Many combinations of logical and physical addresses yield structured address schemes. A very useful, combination of physical and logical addresses is to keep in each block an offset table that holds the offsets of the records within the block. The address of a record is now the physical address of its block plus the offset of the entry in the block's offset table for that record ADVANTAGES move the record around within the block We can even allow the record to move to another block Finally, we have an option, should the record be deleted, of leaving in its offset-table entry a tombstone, a special value that indicates the record has been deleted. 12.3.3 POINTER SWIZZLING Relational systems need the ability to represent pointers in tuples Index structures are composed of blocks that usually have pointers within them Thus, we need to study the management of pointers as blocks are moved between main and secondary memory. Every data item (block, record, etc.) has two addresses: database address: address on the disk memory address, if the item is in virtual memory AUTOMATIC SWIZZLING As soon as a block is brought into memory, we locate dl its pointers and addresses and enter them into the translation table if they are not already there. However we need some mechanism to locate the pointers. For example: 1. If the block holds records with a known schema, the schema will tell us where in the records the pointers are found. 2. If the block is used for one of the index structures then the block will hold pointers at known locations. 3. We may keep within the block header a list of where the pointers are. Swizzling on Demand No Swizzling Only swizzle a pointer if and when it is actually followed. Pointers are not swizzled they are accesses using the database address. Unswizzling When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses Use translation table again Important to have an efficient data structure for the translation table RECORDS WITH VARIABLE-LENGTH FIELDS A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variablelength fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields. RECORDS WITH REPEATING FIELDS A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first. We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F. Where upon we stop. STORING VARIABLE-LENGTH FIELDS SEPARATELY FROM THE RECORD VARIABLE-FORMAT RECORDS Variable-Format Records The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field. There are at least two reasons why tagged fields would make sense. There are at least two reasons why tagged fields would make sense. 1. Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them. RECORDS THAT DO NOT FIT IN A BLOCK These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks. Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment. BLOBS • Binary, Large OBjectS = BLOBS • BLOBS can be images, movies, audio files and other very large values that can be stored in files. Storing BLOBS – Stored in several blocks. • – • Preferable to store them consecutively on a cylinder or multiple disks for efficient retrieval. Retrieving BLOBS – A client retrieving a 2 hour movie may not want it all at the same time. – Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.) COLUMN STORES An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records Insertion: Insertion of records without order Records can be placed in a block with empty space or in a new block. Insertion of records in fixed order Space available in the block No space available in the block (outside the block) Structured address Pointer to a record from outside the block. Insertion in fixed order Space available within the block Use of an offset table in the header of each block with pointers to the location of each record in the block. Deletion: Recover space after deletion When using an offset table, the records can be slid around the block so there will be an unused region in the center that can be recovered. In case we cannot slide records, an available space list can be maintained in the block header. The list head goes in the block header and available regions hold the links in the list. Use of tombstone The tombstone is placed in a record in order to avoid pointers to the deleted record to point to new records. The tombstone is permanent until the entire database is reconstructed. If pointers go to fixed locations from which the location of the record is found then we put the tombstone in that fixed location. (See examples) Where a tombstone is placed depends on the nature of the record pointers. Map table is used to translate logical record address to physical address. UPDATING RECORDS For Fixed-Length Records, there is no effect on the storage system For variable length records : If length increases, like insertion “slide the records” • If length decreases, like deletion we update the spaceavailable list, recover the space/eliminate the overflow blocks. • CHAPTER:18 18.1 Serial and Serializable Schedule A process of assuming that the transactions preserve the consistency when executing simultaneously is called Concurrency Control. This consistency is taken care by Scheduler. Concurrency control in database management systems (DBMS) ensures that database transactions are performed concurrently without the concurrency violating the data integrity of a database. Executed transactions should follow the ACID rules. The DBMS must guarantee that only serializable (unless Serializability is intentionally relaxed), recoverable schedules are generated. It also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database. ACID rules Atomicity - Either the effects of all or none of its operations remain when a transaction is completed - in other words, to the outside world the transaction appears to be indivisible, atomic. Consistency - Every transaction must leave the database in a consistent state. Isolation - Transactions cannot interfere with each other. Providing isolation is the main goal of concurrency control. Durability - Successful transactions must persist through crashes. In the field of databases, a schedule is a list of actions, (i.e. reading, writing, aborting, committing), from a set of transactions. In this example, Schedule D is the set of 3 transactions T1, T2, T3. The schedule describes the actions of the transactions as seen by the DBMS. T1 Reads and writes to object X, and then T2 Reads and writes to object Y, and finally T3 Reads and writes to object Z. This is an example of a serial schedule, because the actions of the 3 transactions are not interleaved. Serial and Serializable Schedules: A schedule that is equivalent to a serial schedule has the serializability property. In schedule E, the order in which the actions of the transactions are executed is not the same as in D, but in the end, E gives the same result as D. Serial Schedule TI precedes T2 T1 T2 READ (A,t) t := t+100 WRITE (A,t) A B 50 50 150 READ (A,s) s := s*2 WRITE (A,s) READ (B,t) t := t+100 WRITE (B,t) 300 150 READ (B,s) s := s*2 WRITE (B,s) 300 Non-Serializable Schedule T1 T2 READ (A,t) t := t+100 WRITE (A,t) B 50 50 150 READ (A,s) s := s*2 WRITE (A,s) READ (B,s) s := s*2 WRITE (B,s) READ (B,t) t := t+100 WRITE (B,t) A 300 100 200 A Serializable Schedule with details T1 T2 READ (A,t) t := t+100 WRITE (A,t) B 50 50 150 READ (A,s) s := s*1 WRITE (A,s) READ (B,s) s := s*1 WRITE (B,s) READ (B,t) t := t+100 WRITE (B,t) A 150 50 150 CONFLICTS Definition is a pair of consecutive actions in a schedule such that, if their order is interchanged, then the behavior of at least one of the transactions involved can change. Non-conflicting actions: Let Ti and Tj be two different transactions (i ≠ j), then: ri(X); rj(Y) is never a conflict, even if X = Y. ri(X); wj(Y) is not a conflict provided X ≠ Y. wi(X); rj(Y) is not a conflict provided X ≠ Y. Similarly, wi(X); wj(Y) is also not a conflict, provided X ≠ Y. Three situations of conflicting actions (where we may not swap their order) Two actions of the same transaction. e.g., ri(X);wi(Y) Two writes of the same database element by different transactions. e.g., wi(X);wj(X) A read and a write of the same database element by different transactions. e.g., ri(X);wj(X) To summarize, any two actions of different transactions may be swapped unless: They involve the same database element, and At least one of them is a write operation. PRECEDENCE GRAPHS Conflicting pairs of actions (of a schedule S) put constraints on the order of transactions in the hypothetical, conflict-equivalent serial schedule. For a schedule S, involving transactions T1 and T2(among other transactions), we say that T1 takes precedence over T2 (written as T1 <s T2)if there are actions A1 of T1 and A2 of T2, such that: A1 is ahead of A2 in S, Both A1 and A2 involve the same database element, and At least one of them is a write operation. The precedences mentioned in the previous slide can be depicted in a “precedence graph”. The nodes in this graph are the transactions of the schedule S. Example of a precedence graph: Consider a schedule S which involves three transactions T1, T2 and T3, i.e., S: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B); TEST FOR CONFLICT-SERIALIZABILITY Construct the precedence graph for S and observe if there are any cycles. If yes, then S is not conflict-serializable Else, it is a conflict-serializable schedule. Example of a cyclic precedence graph: Consider the below schedule S1: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B); The precedence graph for this as shown below: 1 2 Observing the actions of A in the previous example (figure 2), we can find that T2 <s1 T3. But when we observe B, we get both T1 <s1 T2 and T2 <s1 T1. Thus the graph has a cycle between 1 and 2. So, based on this fact we can conclude that S1 is not conflictserializable. WHY THE PRECEDENCEGRAPH TEST WORKS A cycle in the graph puts too many constraints on the order of transactions in a hypothetical conflictequivalent serial schedule. If there is a cycle involving n transactions T1 ..Tn T1 T2 Then in the hypothetical serial order, the actions of T1 must precede those of T2 which would precede those of T3... up to n. But actions of Tn are also required to precede those of T1. So, if there is a cycle in the graph, then we can conclude that the schedule is not conflict-serializable. 18.3 LOCKING SYSTEMS WITH SEVERAL LOCK MODES Previous locking schemes were too simple to be practical. Locking Scheme Shared/Read Lock ( For Reading) Exclusive/Write Lock( For Writing) Compatibility Matrices Upgrading Locks Update Locks Increment Locks Shared & Exclusive Locks: Consistency of Transactions Cannot write without Exclusive Lock Cannot read without holding some lock This basically works on these principles – 1. Consistency of Transactions A read action can only proceed a shared or an exclusive lock A write lock can only proceed a exclusive lock All locks need to be unlocked before commit 2. Two-phase locking of transactions Locking Must precede unlocking 3. Legality of Schedules An element may be locked exclusively by one transaction or by several in shared mode, but not both Compatibility Matrices: Has a row and column for each lock mode. Rows correspond to a lock held on an element by another transaction Columns correspond to mode of lock requested. Example :The column for S says that we can grant a shared lock on an element if the only locks held on that element currently are shared locks. Upgrading Locks Suppose a transaction wants to read as well as write : It acquires a shared lock on the element Performs the calculations on the element And when its ready to write, It is granted a exclusive lock. Transactions with unpredicted read write locks can use upgrading locks. Indiscriminating use of upgrading produces a deadlock. (Limitation) Example : Both the transactions want to upgrade on the same element LOCKS It works as follows : A request from transaction Scheduler checks in the lock table Generates a serializable schedule of actions. CONSISTENCY OF TRANSACTIONS Actions and locks must relate each other Transactions can only read & write only if has a lock and has not released the lock. Unlocking an element is compulsory. Legality of schedules No two transactions can aquire the lock on same element without the prior one releasing it. LOCKING SCHEDULER Grants lock requests only if it is in a legal schedule. Lock table stores the information about current locks on the elements. A legal schedule of consistent transactions but unfortunately it is not a serializable. LOCKING SCHEDULE (CONTD.) The locking scheduler delays requests that would result in an illegal schedule. 84 TWO-PHASE LOCKING Guarantees a legal schedule of consistent transactions is conflict-serializable. All lock requests proceed all unlock requests. The growing phase: Obtain all the locks and no unlocks allowed. The shrinking phase: Release all the locks and no locks allowed. WORKING OF TWO-PHASE LOCKING Assures serializability. Two protocols for 2PL: Strict two phase locking : Transaction holds all its exclusive locks till commit / abort. Rigorous two phase locking : Transaction holds all locks till commit / abort. Possible to find a transaction Tj that has a 2PL and a schedule S for Ti ( non 2PL ) and Tj that is not conflict serializable. 86 FAILURE OF 2PL. 2PL fails to provide security against deadlocks. 87 LOCKING SYSTEMS WITH SEVERAL LOCK MODES Previous locking schemes were too simple to be practical. Locking Scheme Shared/Read Lock ( For Reading) Exclusive/Write Lock( For Writing) Compatibility Matrices Upgrading Locks Update Locks Increment Locks 88 SHARED & EXCLUSIVE LOCKS Consistency of Transactions Cannot write without Exclusive Lock Cannot read without holding some lock basically works on these principles – 1. Consistency of Transactions This A read action can only proceed a shared or an exclusive lock A write lock can only proceed a exclusive lock All locks need to be unlocked before commit 89 CONTINUED.. 2. Two-phase locking of transactions Locking Must precede unlocking 3. Legality of Schedules An element may be locked exclusively by one transaction or by several in shared mode, but not both. 90 COMPATIBILITY MATRICES Has a row and column for each lock mode. Rows correspond to a lock held on an element by another transaction Columns correspond to mode of lock requested. Example :The column for S says that we can grant a shared lock on an element if the only locks held on that element currently are shared locks. LOCK REQUESTED LOCK HOLD S X S YES NO X NO NO 91 UPGRADING LOCKS Suppose a transaction wants to read as well as write : It acquires a shared lock on the element Performs the calculations on the element And when its ready to write, It is granted a exclusive lock. Transactions with unpredicted read write locks can use upgrading locks. 92 UPGRADING LOCKS (CONT.) Indiscriminating use of upgrading produces a deadlock. (Limitation) Example : Both the transactions want to upgrade on the same element 93 UPDATE LOCKS Solves the deadlock occurring in upgrade lock method. A transaction in an update lock can read but cant write. Update lock can later be converted to exclusive lock. An update lock can only be given if the element has shared locks. 94 UPDATE LOCKS (CONT.) An update lock is like a shared lock when you are requesting it and is like a exclusive lock when you having it. Compatibility matrix : - Columns for U & S locks are the same and rows for U & X are the same 95 INCREMENT LOCKS Used for incrementing & decrementing stored values. E.g. - Transfer money from one bank to another, Ticket selling transactions in which number seats are decremented after each transaction. 96 INCREMENT LOCK (CONT.) A increment lock does not enable either read or write locks on element. Any number of transaction can hold increment lock on an element at any time. Shared and exclusive locks cannot be granted if an increment lock is granted on element 97 18.5 LOCKING SCHEDULER The order in which the individual steps of different transactions occur is regulated by the scheduler. The general process of assuring that transactions preserve consistency when executing simultaneously is called concurrency control. Architecture of a Locking Scheduler The transactions themselves do not request locks, or cannot be relied upon to do so. It is the job of the scheduler to insert lock actions into the stream of reads, writes and other actions that access data. Transactions do not locks. Rather the scheduler releases the locks when the transaction manager tells it that the transaction will commit or abort. ROLE OF A SCHEDULER LOCK TABLE Lock Table The lock table is a relation that associates database elements with locking information about that element. The table is implemented with a hash table using database elements as a hash key. Size of Lock Table The size of the table is proportional to the number of locked elements only and not to the entire size of the database since any element that is not locked does not appear in the table. Group Mode The group mode is a summary of the most stringent conditions that a transaction requesting a new lock on an element faces. Rather than comparing the lock request with every lock held by another transaction on the same element, we can simplify the grant/deny decision by comparing the request with only the group mode. STRUCTURE OF LOCK TABLE ENTRIES Handling Lock Requests Suppose transaction T requests a lock on A. If there is no lock-table entry for A, then surely there are no locks on A, so the entry is created and the request is granted. If the lock-table entry for A exists then we use it to guide the decision about the lock request. Handling Unlocks If the value of waiting is ‘Yes’ then we need to grant one or more locks from the list of requested locks. The different approaches for this are: First-come-first-served Priority to shared locks Priority to upgrading 18.6 MANAGING HIERARCHIES OF DATABASE ELEMENTS It Focus on two problems that come up when there id tree structure to our data. 1. Tree Structure : Hierarchy of lockable elements. And How to allow locks on both large elements, like Relations and elements in it such as blocks and tuples of relation, or individual. 2. Another is data that is itself organized in a tree. A major example would be B-tree index. Locks With Multiple Granularity “Database Elements” : It is sometime noticeably the various elements which can be used for locking. Eg: Tuples, Pages or Blocks, Relations etc. EXAMPLE: BANK DATABASE Small granularity locks: Larger concurrency can achieved. Large granularity locks: Some times saves from unserializable behavior. Warning locks The solution to the problem of managing locks at different granularities involves a new kind of lock called a “Warning.“ • It is helpful in hierarchical or nested structure . • It involves both “ordinary” locks and “warning” locks. •Ordinary locks: Shared(S) and Exclusive(X) locks. •Warning locks: Intention to shared(IS) and Intention to Exclusive(IX) locks. Warning Protocols 1. To place an ordinary S or X lock on any element. we must begin at the root of the hierarchy. 2. If we are at the element that we want to lock, we need look no further. We request lock there only 3. If the element is down in hierarchy then place warning lock on that node respective of shared and exclusive locks and then Move on to appropriate child and then try steps 2 or 3 and until you go to desired node and then request shared or exclusive lock. COMPATIBILITY MATRIX IS IX S X IS YES YES YES NO IX YES YES NO NO S YES NO YES NO X NO NO NO NO IS column: Conflicts only on X lock. IX column: Conflicts on S and X locks. S column: Conflicts on X and IX locks. X column: Conflicts every locks. WARNING PROTOCOLS Consider the relation: M o v i e ( t i t l e , year, length, studioName) Transaction1 (T1): SELECT * FROM Movie WHERE title = 'King Kong'; Transaction2(T2): UPDATE Movie SET year = 1939 WHERE title = 'Gone With the Wind'; TREE BASED LOCKING B tree index in a system that treats individual nodes( i.e. blocks) as lockable database elements. The Node Is the right level granularity. We use a standard set of locks modes like shared, exclusive, and update locks and we use two phase locking RULES FOR ACCESS TREE STRUCTURED DATA There are few restrictions in locks from the tree protocol. We assume that that there are only one kind of lock. Transaction is consider a legal and schedules as simple. Expected restrictions by granting locks only when they do not conflict with locks already at a node, but there is no two phase locking requirement on transactions. WHY THE TREE PROTOCOL WORKS. A transaction's first lock may be at any node of the tree. Subsequent locks may only be acquired if the transaction currently has a lock on the parent node. Nodes may be unlocked at any time A transaction may not relock a node on which it has released a lock, even if it still holds a lock on the node’s parent A tree structure of Lockable elements THREE TRANSACTIONS FOLLOWING THE TREE PROTOCOL WHY THE TREE PROTOCOL WORKS? The Tree protocol forces a serial order on the transactions involved in a schedule. Ti <sTj if in schedule S., the transaction Ti and Tj lock a node in common and Ti locks the node first. Example If precedence graph drawn from the precedence relations that we defined above has no cycles, then we claim that any topological order of transactions is an equivalent serial schedule. For Example either ( T1,T2,T3) or (T3,T1,T2) is an equivalent serial schedule the reason for this serial order is that all the nodes are touched in the same order as they are originally scheduled. If two transactions lock several elements in common, then they are all locked in same order. I am Going to explain this with help of an example. Precedence graph derived from Schedule EXAMPLE:--4 PATH OF ELEMENTS LOCKED BY TWO TRANSACTIONS CONTINUED…. Now Consider an arbitrary set of transactions T1, T2;.. . . Tn,, that obey the tree protocol and lock some of the nodes of a tree according to schedule S. First among those that lock, the root. they do also in same order. If Ti locks the root before Tj, Then Ti locks every node in common with Tj does. That is Ti<sTj, But not Tj>sTi. What is Timestamping? Scheduler assign each transaction T a unique number, it’s timestamp TS(T). Timestamps must be issued in ascending order, at the time when a transaction first notifies the scheduler that it is beginning. Two methods of generating Timestamps. Use the value of system, clock as the timestamp. Use a logical counter that is incremented after a new timestamp has been assigned. Scheduler maintains a table of currently active transactions and their timestamps irrespective of the method used Timestamps for database element X and commit bit RT(X):- The read time of X, which is the highest timestamp of transaction that has read X. WT(X):- The write time of X, which is the highest timestamp of transaction that has write X. C(X):- The commit bit for X, which is true if and only if the most recent transaction to write X has already committed. Physically Unrealizable Behavior Read too late: A transaction U that started after transaction T, but wrote a value for X before T reads X. U reads X T writes X T start U start Figure: Transaction T tries to write too late Physically Unrealizable Behavior A transaction U that started after T, but read X before T got a chance to write X. Dirty Read It is possible that after T reads the value of X written by U, transaction U will abort. U reads X T writes X T start U start Figure: Transaction T tries to write too late U writes X T reads X U start T start U aborts T could perform a dirty read if it reads X when shown Rules for Timestamps-Based scheduling 1. Scheduler receives a request rT(X) a) If TS(T) ≥ WT(X), the read is physically realizable. 1. If C(X) is true, grant the request, if TS(T) > RT(X), set RT(X) := TS(T); otherwise do not change RT(X). 2. If C(X) is false, delay T until C(X) becomes true or transaction that wrote X aborts. b) If TS(T) < WT(X), the read is physically unrealizable. Rollback T. Scheduler receives a request WT(X). a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically realizable and must be performed. 1. Write the new value for X, 2. Set WT(X) := TS(T), and 3. Set C(X) := false. b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is physically realizable, but there is already a later values in X. a. If C(X) is true, then the previous writers of X is committed, and ignore the write by T. b. If C(X) is false, we must delay T. c) if TS(T) < RT(X), then the write is physically unrealizable, and T must be rolled back. 3. Scheduler receives a request to commit T. It must find all the database elements X written by T and set C(X) := true. If any transactions are waiting for X to be committed, these transactions are allowed to proceed. 4. Scheduler receives a request to abort T or decides to rollback T, then any transaction that was waiting on an element X that T wrote must repeat its attempt to read or write. Multiversion Timestamps Multiversion schemes keep old versions of data item to increase concurrency. Each successful write results in the creation of a new version of the data item written. Use timestamps to label versions. When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction, and return the value of the selected version. Timestamps and Locking Generally, timestamping performs better than locking in situations where: Most transactions are read-only. It is rare that concurrent transaction will try to read and write the same element. In high-conflict situation, locking performs better than timestamps Validation based scheduling Scheduler keeps a record of what the active transactions are doing. Executes in 3 phases 1. Read- reads from RS( ), computes local address 2. Validate- compares read and write sets 3. Write- writes from WS( ) Contains an assumed serial order of transactions. Maintains three sets: START( ): set of T’s started but not completed validation. VAL( ): set of T’s validated but not finished the writing phase. FIN( ): set of T’s that have finished. EXPECTED EXCEPTIONS 1. Suppose there is a transaction U, such that: U is in VAL or FIN; that is, U has validated, FIN(U)>START(T); that is, U did not finish before T started RS(T) ∩WS(T) ≠φ; let it contain database element X. 2. Suppose there is transaction U, such that: • U is in VAL; U has successfully validated. •FIN(U)>VAL(T); U did not finish before T entered its validation phase. •WS(T) ∩ WS(U) ≠φ; let x be in both write sets. Validation rules Check that RS(T) ∩ WS(U)= φ for any previously validated U that did not finish before T has started i.e. FIN(U)>START(T). Check that WS(T) ∩ WS(U)= φ for any previously validated U that did not finish before T is validated i.e. FIN(U)>VAL(T) SOLUTION Validation of U: Nothing to check Validation of T: WS(U) ∩ RS(T)= {D} ∩{A,B}=φ WS(U) ∩ WS(T)= {D}∩ {A,C}=φ Validation of V: RS(V) ∩ WS(T)= {B}∩{A,C}=φ WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ RS(V) ∩ WS(U)={B} ∩{D}=φ Validation of W: RS(W) ∩ WS(T)= {A,D}∩{A,C}={A} WS(W) ∩ WS(V)= {A,D}∩{D,E}={D} WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ (W is not validated) Concurrency Storage Utilization control OMPARISON Mechanisms C Delays Locks Space in the lock table is Delays transactions proportional to the but avoids rollbacks number of database elements locked. Timestamps Space is needed for read and write times with every database element, neither or not it is currently accessed. Do not delay the transactions but cause them to rollback unless Interface is low Validation Space is used for timestamps and read or write sets for each currently active transaction, plus a few more transactions that finished after some currently active transaction began. Do not delay the transactions but cause them to rollback unless interface is low WHY INFORMATION INTEGRATION ? Databases are created independently, even if they later need to work together. The use of databases evolves, so we can not design a database to support every possible future use. We will understand Information integration from an example of University Database. UNIVERSITY DATABASE Earlier we had different databases for different functions like; Registrar Database for keeping data about courses and student grades for generating transcripts. Bursar Database for keeping data about the tuition payments by students. Human Resource Department Database for recording employees including those students with teaching assistantship jobs. Applications were build using these databases like generation of payroll checks, calculation of taxes and social security payments to government. But these databases independently were of no use as a change in 1 database would not reflect in the other database which had to be performed manually. For e.g. we want to make sure that Registrar does not record grades of the student who did not pay the fees at Bursars office. Building a whole new database for the system again is a very expensive and time consuming process. In addition to paying for a very expensive software the University will have to run both the old and the new databases together for a long time to see that the new system works properly or not. A Solution for this is to build a layer of abstraction, called middleware, on top of all legacy databases, without disturbing the original databases. Now we can query this middleware layer to retrieve or update data. Often this layer is defined by a collection of classes and queried in an Object oriented language. New applications can be written to access this layer for data, while the legacy applications continue to run using the legacy database. THE HETEROGENEITY PROBLEM When we try to connect information sources that were developed independently, we invariably find that sources differ in many ways. Such sources are called Heterogeneous, and the problem of integrating them is referred to as the Heterogeneity Problem. There are different levels of heterogeneity viz. 1. 2. 3. 4. 5. 6. Communication Heterogeneity. Query-Language Heterogeneity. Schema Heterogeneity. Data type differences. Value Heterogeneity. Semantic Heterogeneity. COMMUNICATION HETEROGENEITY Today, it is common to allow access to your information using HTTP protocols. However, some dealers may not make their databases available on net, but instead accept remote accesses via anonymous FTP. Suppose there are 1000 dealers of Aardvark Automobile Co. out of which 900 use HTTP while the remaining 100 use FTP, so there might be problems of communication between the dealers databases. QUERY LANGUAGE HETEROGENEITY The manner in which we query or modify a dealer’s database may vary. For e.g. Some of the dealers may have different versions of database like some might use relational database some might not have relational database, or some of the dealers might be using SQL, some might be using Excel spreadsheets or some other database. SCHEMA HETEROGENEITY Even assuming that the dealers use a relational DBMS supporting SQL as the query language there might be still some heterogeneity at the highest level like schemas can differ. For e.g. one dealer might store cars in a single relation while the other dealer might use a schema in which options are separated out into a second relation. DATA TYPE DIFFRENCES Serial Numbers might be represented by a character strings of varying length at one source and fixed length at another. The fixed lengths could differ, and some sources might use integers rather than character strings. VALUE HETEROGENEITY The same concept might be represented by different constants at different sources. The color Black might be represented by an integer code at one source, the string BLACK at another, and the code BL at a third. SEMANTIC HETEROGENEITY Terms might be given different interpretations at different sources. One dealer might include trucks in Cars relation, while the another puts only automobile data in Cars relation. One dealer might distinguish station wagons from the minivans, while another doesn’t. 21.2 Federations The simplest architecture for integrating several DBs One to one connections between all pairs of DBs n DBs talk to each other, n(n-1) wrappers are needed Good when communications between DBs are limited WRAPPER Wrapper : a software translates incoming queries and outgoing answers. In a result, it allows information sources to conform to some shared schema. FEDERATIONS DIAGRAM DB1 DB2 2 Wrappers 2 Wrappers 2 Wrappers 2 Wrappers 2 Wrappers 2 Wrappers DB3 DB4 A federated collection of 4 DBs needs 12 components to translate queries from one to another. EXAMPLE Car dealers want to share their inventory. Each dealer queries the other’s DB to find the needed car. Dealer-1’s DB relation: NeededCars(model,color,autoTrans) Dealer-2’s DB relation: Auto(Serial, model, color) Options(serial,option) Dealer-1’s DB wrapper wrapper Dealer-2’s DB EXAMPLE… For(each tuple(:m,:c,:a) in NeededCars){ if(:a=TRUE){/* automatic transmission wanted */ SELECT serial FROM Autos, Options WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’ AND Autos.model = :m AND Autos.color =:c; } Else{/* automatic transmission not wanted */ SELECT serial FROM Auto WHERE Autos.model = :m AND Autos.color = :c AND NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial AND option=‘autoTrans’); } } Dealer 1 queries Dealer 2 for needed cars DATA WAREHOUSE Sources are translated from their local schema to a global schema and copied to a central DB. User transparent: user uses Data Warehouse just like an ordinary DB User is not allowed to update Data Warehouse WAREHOUSE DIAGRAM User query result Warehouse Combiner Extractor Extractor Source 1 Source 2 EXAMPLE Construct a data warehouse from sources DB of 2 car dealers: Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…) Dealer-2’s schema: Auto(serial,model,color) Options(serial,option) Warehouse’s schema: AutoWhse(serialNo,model,color,autoTrans,dealer) Extractor --- Query to extract data from Dealer-1’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,autoTrans,’dealer1’ From Cars; EXAMPLE Extractor --- Query to extract data from Dealer-2’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’yes’,’dealer2’ FROM Autos,Options WHERE Autos.serial=Options.serial AND option=‘autoTrans’; INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’no’,’dealer2’ FROM Autos WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial AND option = ‘autoTrans’); CONSTRUCT DATA WAREHOUSE There are mainly 3 ways to constructing the data in the warehouse: 1) Periodically reconstructed from the current data in the sources, once a night or at even longer intervals. Advantages: simple algorithms. Disadvantages: 1) need to shut down the warehouse; 2) data can become out of date. CONSTRUCT DATA WAREHOUSE 2) Updated periodically based on the changes(i.e. each night) of the sources. Advantages: involve smaller amounts of data. (important when warehouse is large and needs to be modified in a short period) Disadvantages: 1) the process to calculate changes to the warehouse is complex. 2) data can become out of date. CONSTRUCT DATA WAREHOUSE 3) Changed immediately, in response to each change or a small set of changes at one or more of the sources. Advantages: data won’t become out of date. Disadvantages: requires too much communication, therefore, it is generally too expensive. (practical for warehouses whose underlying sources changes slowly.) MEDIATORS Virtual warehouse, which supports a virtual view or a collection of views, that integrates several sources. Mediator doesn’t store any data. Mediators’ tasks: 1)receive user’s query, 2)send queries to wrappers, 3)combine results from wrappers, 4)send the final result to user. A MEDIATOR DIAGRAM Result User query Mediator Query Result Result Query Wrapper Query Result Source 1 Wrapper Query Result Source 2 EXAMPLE Same data sources as the example of data warehouse, the mediator Integrates the same two dealers’ source into a view with schema: AutoMed(serialNo,model,color,autoTrans,dealer) When the user have a query: SELECT sericalNo, model FROM AkutoMed Where color=‘red’ EXAMPLE In this simple case, the mediator forwards the same query to each Of the two wrappers. Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …) SELECT serialNo,model FROM cars WHERE color = ‘red’; Wrapper2: Autos(serial,model,color); Options(serial,option) SELECT serial, model FROM Autos WHERE color=‘red’; The mediator needs to interprets serial into serialNo, and then returns the union of these sets of data to user. 21.3 INTRODUCTION Mediator Wrapper Wrapper Source 1 Source 2 Query Result Wrapper The wrapper(extractor) consists of: One or more predefined queries (based on source) SQL Web page Suitable communication mechanism for sending and receiving information to/from source/mediator. TEMPLATE FOR QUERY PATTERNS Design a wrapper – Build templates for all possible queries that the mediator can ask. Mediator schema: AutosMed (serialNo,model,color,autoTrans,dealer) Source schema: Cars (serialNo,model,color,autoTrans,navi,…) Mediator -> wrapper for cars of a given color ($c): SELECT * FROM AutoMed WHERE color = ‘$c’; => SELECT serialNo,model,color,autoTrans,’dealer1’ FROM Cars WHERE color = ‘$c’; Wrapper Template describing queries for cars of a given color Templates needed: Pow (2,n) for n attributes For all possible queries from the mediator WRAPPER GENERATORS The software that creates the wrapper is Wrapper Generator. Templates Wrapper Generator Table Wrapper Driver Queries Source Results WRAPPER GENERATORS Wrapper Generator: Creates a table that holds the various query patterns contained in templates. Source queries associated with each of them. The Driver: Accept a query from the mediator. Search the table for a template that matches the query. Send the query to the source. Return the response to the Mediator. FILTERS Consider the Car dealer’s database. The Wrapper template to get the cars of a given model and color is: SELECT * FROM AutoMed WHERE model = ‘$m’ and color = ‘$c’; => SELECT serialNo,model,color,autoTrans,’dealer1’ FROM Cars WHERE model = ‘$m’ and color = ‘$c’; Another approach is to have a Wrapper Filter: The Wrapper has a template that returns a superset of what the query wants. Filter the returned tuples at the Wrapper and pass only the desired tuples. Position of the Filter Component: At the Wrapper At the Mediator FILTERS To find the blue cars of model Ford: Use the template to extract the blue cars. Return the tuples to the Mediator. Filter to get the Ford model cars at the Mediator. Store at the temporary relation: TempAutos (serialNo,model,color,autoTrans,dealer) Filter by executing a local query: SELECT * FROM TempAutos WHERE model = ‘FORD’; OTHER OPERATIONS AT THE WRAPPER It is possible to take the joins at the Wrapper and transmit the result to Mediator. Suppose the Mediator is asked to find dealers and models such that the dealer has two red cars, of the same model, one with and one without automatic transmission: SELECT A1.model, A1.dealer FROM AutosMed A1, AutosMed A2 WHERE A1.model = A2.model AND A1.color = ‘red’ AND A2.color = ‘red’ AND A1.autoTrans = ‘no’ and A2.autoTrans = ‘yes’; Wrapper can first obtain all the red cars: SELECT * FROM AutosMed WHERE color = ‘red’; RedAutos (serialNo,model,color,autoTrans,dealer) OTHER OPERATIONS AT THE WRAPPER The Wrapper then performs a join and the necessary selection. SELECT DISTINCT A1.model, A1.dealer FROM RedAutos A1, RedAutos A2 WHERE A1.model = A2.model AND A1.autoTrans = ‘no’ AND A2.autoTrans = ‘yes’; 21.4 CAPABILITY BASED OPTIMIZATION Introduction Typical DBMS estimates the cost of each query plan and picks what it believes to be the best Mediator – has knowledge of how long its sources will take to answer Optimization of mediator queries cannot rely on cost measure alone to select a query plan Optimization by mediator follows capability based optimization 21.4.1 THE PROBLEM OF LIMITED SOURCE CAPABILITIES Many sources have only Web Based interfaces Web sources usually allow querying through a query form E.g. Amazon.com interface allows us to query about books in many different ways. But we cannot ask questions that are too general E.g. Select * from books; 21.4.1 THE PROBLEM OF LIMITED SOURCE CAPABILITIES (CON’T) • Reasons why a source may limit the ways in which queries can be asked – Earliest database did not use relational DBMS that supports SQL queries – Indexes on large database may make certain queries feasible, while others are too expensive to execute – Security reasons • E.g. Medical database may answer queries about averages, but won’t disclose details of a particular patient's information 21.4.2 A NOTATION FOR DESCRIBING SOURCE CAPABILITIES For relational data, the legal forms of queries are described by adornments Adornments – Sequences of codes that represent the requirements for the attributes of the relation, in their standard order f(free) – attribute can be specified or not b(bound) – must specify a value for an attribute but any value is allowed u(unspecified) – not permitted to specify a value for a attribute 21.4.2 A NOTATION FOR DESCRIBING SOURCE CAPABILITIES….(CONT’D) c[S](choice from set S) means that a value must be specified and value must be from finite set S. o[S](optional from set S) means either do not specify a value or we specify a value from finite set S A prime (f’) specifies that an attribute is not a part of the output of the query A capabilities specification is a set of adornments A query must match one of the adornments in its capabilities specification 21.4.2 A NOTATION FOR DESCRIBING SOURCE CAPABILITIES….(CONT’D) E.g. Dealer 1 is a source of data in the form: Cars (serialNo, model, color, autoTrans, navi) The adornment for this query form is b’uuuu 21.4.3 CAPABILITY-BASED QUERY-PLAN SELECTION • • Given a query at the mediator, a capability based query optimizer first considers what queries it can ask at the sources to help answer the query The process is repeated until: – Enough queries are asked at the sources to resolve all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible. – We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query. 21.4.3 CAPABILITY-BASED QUERY-PLAN SELECTION (CONT’D) • • The simplest form of mediator query where we need to apply the above strategy is join relations E.g we have sources for dealer 2 – Autos(serial, model, color) – Options(serial, option) • • Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi] Query is – find the serial numbers and colors of Gobi models with a navigation system 21.4.4 ADDING COST-BASED OPTIMIZATION • • • • Mediator’s Query optimizer is not done when the capabilities of the sources are examined Having found feasible plans, it must choose among them Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved Sources are independent of the mediator, so it is difficult to estimate the cost 21.5 OPTIMIZING MEDIATOR QUERIES Chain algorithm – a greedy algorithm answers the query by sending a sequence of requests to its sources. Will always find a solution assuming at least one solution exists. The solution may not be optimal. 21.5.1 SIMPLIFIED ADORNMENT NOTATION A query at the mediator is limited to b (bound) and f (free) adornments. We use the following convention for describing adornments: nameadornments(attributes) where: name is the name of the relation the number of adornments = the number of attributes 21.5.2 OBTAINING ANSWERS FOR SUBGOALS Rules for subgoals and sources: Suppose we have the following subgoal: Rx1x2…xn(a1, a2, …, an), and source adornments for R are: y1y2…yn. If yi is b or c[S], then xi = b. If xi = f, then yi is not output restricted. The adornment on the subgoal matches the adornment at the source: If yi is f, u, or o[S] and xi is either b or f. 21.5.3 THE CHAIN ALGORITHM Maintains 2 types of information: An adornment for each subgoal. A relation X that is the join of the relations for all the subgoals that have been resolved. The adornment for a subgoal is b if the mediator query provides a constant binding for the corresponding argument of that subgoal. X is a relation over no attributes, containing just an empty tuple. 21.5.3 THE CHAIN ALGORITHM (CON’T) First, initialize adornments of subgoals and X. Then, repeatedly select a subgoal that can be resolved. Let Rα(a1, a2, …, an) be the subgoal: 1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R. Project X onto its variables that appear in R. 21.5.3 THE CHAIN ALGORITHM (CON’T) 2. For each tuple t in the project of X, issue a query to the source as follows (β is a source adornment). If β has b, then the corresponding component of α has b, and we can use the corresponding component of t for source query. If β has c[S], and the corresponding component of t is in S, then the corresponding component of α has b, and we can use the corresponding component of t for the source query. If β has f, and the corresponding component of α is b, provide a constant value for source query. 21.5.3 THE CHAIN ALGORITHM (CON’T) If a component of β is u, then provide no binding for this component in the source query. If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f. If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S]. Every variable among a1, a2, …, an is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b. 3. 21.5.3 THE CHAIN ALGORITHM (CON’T) 4. 5. Replace X with X πs(R), where S is all of the variables among: a1, a2, …, an. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal. α If every subgoal is resolved, then X is the answer. Else the algorithm fails 21.5.3 THE CHAIN ALGORITHM EXAMPLE Mediator query: Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c) Example: Relation Data Adornment R S T w x x y y z 1 2 2 4 4 6 1 3 3 5 5 7 1 4 5 8 bf c’[2,3,5]f bu 21.5.3 THE CHAIN ALGORITHM EXAMPLE (CON’T) Initially, the adornments on the subgoals are the same as Q, and X contains an empty tuple. S and T cannot be resolved as they each have ff adornments, but the sources have either a, b or c. R(1,a) can be resolved because its adornments are matched by the source’s adornments. Send R(w,x) with w=1 to get the tables on the previous page. 21.5.3 THE CHAIN ALGORITHM EXAMPLE (CON’T) Project the subgoal’s relation onto its second component, since only the second component of R(1,a) is a variable. a 2 3 4 This is joined with X, resulting in X equaling this relation. Change adornment on S from ff to bf. 21.5.3 THE CHAIN ALGORITHM EXAMPLE (CON’T) Now we resolve Sbf(a,b): Project X onto a, resulting in X. Now, search S for tuples with attribute a equivalent a b to attribute a in X. 2 4 3 5 Join this relation with X, and remove a as it doesn’t appear in the head nor any unresolved subgoal: b 4 5 21.5.3 THE CHAIN ALGORITHM EXAMPLE (CON’T) Now we resolve Tbf(b,c): b c 4 6 5 7 5 8 Join this relation with X and project onto the c attribute to get the relation for the head. Solution is {(6), (7), (8)}. 21.5.4 INCORPORATING UNION VIEWS AT THE MEDIATOR This implementation of the Chain Algorithm does not consider that several sources can contribute tuples to a relation. If specific sources have tuples to contribute that other sources may not have, it adds complexity. To resolve this, we can consult all sources, or make best efforts to return all the answers. 21.5.4 INCORPORATING UNION VIEWS AT THE MEDIATOR (CON’T) Consulting All Sources We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal. Less practical because it makes queries harder to answer and impossible if any source is down. Best Efforts We need only 1 source with a matching adornment to resolve a subgoal. Need to modify chain algorithm to revisit each subgoal when that subgoal has new bound requirements. 21.6 LOCAL-AS-VIEW MEDIATORS. In a LAV mediator, global predicates defined are not views of the source data. Expressions are defined for each source with global predicates that describe tuples that source produces Mediator answers the queries by constructing the views as provided by the source. MOTIVATION FOR LAV MEDIATORS Relationship between the data provided by the mediator and the sources is more subtle For example, consider the predicate Par(c, p) meaning that p is a parent of c which represents the set of all child parent facts that could ever exist. The sources will provide information about whatever child-parent facts they know. MOTIVATION(CONTD..) There can be sources which may provide child-grandparent facts but not child- parent facts at all. This source can never be used to answer the child-parent query under GAV mediators. LAV mediators allow to say that a certain source provides grand parent facts. Used to discover how and when to use the source in a given query. TERMINOLOGY FOR LAV MEDIATION. The queries at mediator and those describing the source will be single Datalog rules A single Datalog rule is called a conjunctive query The global predicates of LAV mediator are used as subgoals of mediator queries. Conjunctive queries define views. Their heads each have a unique view predicate that is name of a view. Each view definition consists of global predicates and is associated with a particular source. Each view is constructed with an all-free adornment. EXAMPLE.. Consider global predicate Par(c, p) meaning that p is a parent of c. One source produces parent facts. Its view is defined by the conjunctive queryV1(c, p) Par(c, p) Another source produces some grand parents facts. Then its conjunctive query will be – V2(c, g) Par(c, p) AND Par(p, g) EXAMPLE CONTD.. The query at mediator will ask for great-grand parent facts to be obtained from sources: Q(w, z) Par(w, x) AND Par(x, y) AND Par(y, z) One solution can be using the parent predicate(V1) directly three times. Q(w, z) V1(w, x) AND V1 (x, y) AND V1(y, z) Another solution can be to use V1(parent facts) and V2(grandparent facts). Q(w, z) V1(w, x) AND V2(x, z) Or Q(w, z) V2(w, y) AND V1(y, z) EXPANDING SOLUTIONS. Consider a query Q, a solution S that has a body whose subgoals are views and each view V is defined by a conjunctive query with that view as the head. The body of V’s conjunctive query can be substituted for a subgoal in S that uses the predicate V to have a body consisting of only global predicates. EXPANSION ALGORITHM A solution S has a subgoal V(a1, a2,…an) where ai’s can be any variables or constants. The view V can be of the form V(b1, b2,….bn) B Where B represents the entire body. V(a1, a2, … an) can be replaced in solution S by a version of body B that has all the subgoals of B with variables possibly altered. EXPANSION ALGORITHM CONTD.. 1. 2. 3. The rules for altering the variables of B are: First identify the local variables B, variables that appear in the body but not in the head. If there are any local variables of B that appear in B or in S, replace each one by a distinct new variable that appears nowhere in the rule for V or in S. In the body B, replace each bi by ai for i= 1,2…n. EXAMPLE. Consider the view definitions, V1(c, p) Par(c, p) V2(c, g) Par(c, p) AND Par(p, g) One of the proposed solutions S is Q(w, z) V1(w, x) AND V2(x, z) The first sub goal with predicate V1 in the solution can be expanded as Par(w, x) as there are no local variables. EXAMPLE CONTD. The V2 subgoal has a local variable p which doesn’t appear in S nor it has been used as a local variable in another substitution. So p can be left as it is. Only x and z are to be substituted for variables c and g. The Solution S now will be Q(w, z) Par(w, x) AND Par(x, p) AND Par(p,z) CONTAINMENT OF CONJUNCTIVE QUERIES 1. 2. A containment mapping from Q to E is a function т from the variables of Q to the variables and constants of E, such that: If x is the ith argument of the head of Q, then т(x) is the ith argument of the head of E. Add to т the rule that т(c)=c for any constant c. If P(x1,x2,… xn) is a subgoal of Q, then P(т(x1), т(x2),… т(xn)) is a subgoal of E. EXAMPLE. Consider two Conjunctive queries: Q1: H(x, y) A(x, z) and B(z, y) Q2: H(a, b) A(a, c) AND B(d, b) AND A(a, d) When we apply the substitution, Т(x) = a, Т(y) = b, Т(z) = d, the head of Q1 becomes H(a, b) which is the head of Q2. So,there is a containment mapping from Q1 to Q2. EXAMPLE CONTD.. The first subgoal of Q1 becomes A(a, d) which is the third subgoal of Q2. The second subgoal of Q1 becomes the second subgoal of Q2. There is also a containment mapping from Q2 to Q1 so the two conjunctive queries are equivalent. WHY THE CONTAINMENT-MAPPING TEST WORKS Suppose there is a containment mapping т from Q1 to Q2. When Q2 is applied to the database, we look for substitutions σ for all the variables of Q2. The substitution for the head becomes a tuple t that is returned by Q2. If we compose т and then σ, we have a mapping from the variables of Q1 to tuples of the database that produces the same tuple t for the head of Q1. FINDING SOLUTIONS TO A MEDIATOR QUERY There can be infinite number of solutions built from the views using any number of subgoals and variables. LMSS Theorem can limit the search which states that • If a query Q has n subgoals, then any answer produced by any solution is also produced by a solution that has at most n subgoals. If the conjunctive query that defines a view V has in its body a predicate P that doesn’t appear in the body of the mediator query, then we need not consider any solution that uses V. EXAMPLE. Recall the query Q1: Q(w, z) Par(w, x) AND Par(x, y) AND Par(y, z) This query has three subgoals, so we don’t have to look at solutions with more than three subgoals. WHY THE LMSS THEOREM HOLDS Suppose we have a query Q with n subgoals and there is a solution S with more than n subgoals. The expansion E of S must be contained in Query Q, which means that there is a containment mapping from Q to E. We remove from S all subgoals whose expansion was not the target of one of Q’s subgoals under the containment mapping. CONTD.. We would have a new conjunctive query S’ with at most n sub goals. If E’ is the expansion of S’ then, E’ is a subset of Q. S is a subset of S’ as there is an identity mapping. Thus S need not be among the solutions to query Q. 21.7 ENTITY RESOLUTION: Entity resolution is a problem that arises in many information integration scenarios. It refers to determining whether two records or tuples do or do not represent the same person, organization, place or other entity. DECIDING WHETHER RECORDS REPRESENT A COMMON ENTITY Two records represent the same individual if the two records have similar values for each of the fields associated with those records. It is not sufficient that the values of corresponding fields be identical because of following reasons: 1. Misspellings 2. Variant Names 3. Misunderstanding of Names CONTINUE: DECIDING WHETHER RECORDS REPRESENT A COMMON ENTITY 4. Evolution of Values 5. Abbreviations Thus when deciding whether two records represent the same entity, we need to look carefully at the kinds of discrepancies and use the test that measures the similarity of records. DECIDING WHETHER RECORDS REPRESENTS A COMMON ENTITY EDIT DISTANCE First approach to measure the similarity of records is Edit Distance. Values that are strings can be compared by counting the number of insertions and deletions of characters it takes to turn one string into another. So the records represent the same entity if their similarity measure is below a given threshold. DECIDING WHETHER RECORDS REPRESENTS A COMMON ENTITY NORMALIZATION To normalize records by replacing certain substrings by others. For instance: we can use the table of abbreviations and replace abbreviations by what they normally stand for. Once normalize we can use the edit distance to measure the difference between normalized values in the fields. MERGING SIMILAR RECORDS Merging means replacing two records that are similar enough to merge and replace by one single record which contain information of both. There are many merge rules: 1. Set the field in which the records disagree to the empty string. 2. (i) Merge by taking the union of the values in each field (ii) Declare two records similar if at least two of the three fields have a nonempty intersection. CONTINUE: MERGING SIMILAR RECORDS Name 1. Susan 2. Susan 3. Susan Address 123 Oak St. 456 Maple St. 456 Maple St. Phone 818-555-1234 818-555-1234 213-555-5678 After Merging Name (1-2-3) Susan 5678} Address Phone {123 Oak St.,456 Maple St} {818-555-1234, 213555- USEFUL PROPERTIES OF SIMILARITY AND MERGE FUNCTIONS The following properties say that the merge operation is a semi lattice : 1. Idempotence : That is, the merge of a record with itself should surely be that record. 2. Commutativity : If we merge two records, the order in which we list them should not matter. 3. Associativity : The order in which we group records for a merger should not matter. CONTINUE: USEFUL PROPERTIES OF SIMILARITY AND MERGE FUNCTIONS There are some other properties that we expect similarity relationship to have: • Idempotence for similarity : A record is always similar to itself • Commutativity of similarity : In deciding whether two records are similar it does not matter in which order we list them • Representability : If r is similar to some other record s, but s is instead merged with some other record t, then r remains similar to the merger of s and t and can be merged with that record. R-SWOOSH ALGORITHM FOR ICAR RECORDS Input: A set of records I, similarity function and a merge function. Output: A set of merged records O. Method: O:= emptyset; WHILE I is not empty DO BEGIN Let r be any record in I; Find, if possible, some record s in O that is similar to r; IF no record s exists THEN move r from I to O ELSE BEGIN delete r from I; delete s from O; add the merger of r and s to I; END; END; OTHER APPROACHES TO ENTITY RESOLUTION The other approaches to entity resolution are : Non- ICAR Datasets Clustering Partitioning Other Approaches to Entity Resolution - Non ICAR Datasets Non ICAR Datasets : We can define a dominance relation r<=s that means record s contains all the information contained in record r. If so, then we can eliminate record r from further consideration. OTHER APPROACHES TO ENTITY RESOLUTION - CLUSTERING Clustering: Some time we group the records into clusters such that members of a cluster are in some sense similar to each other and members of different clusters are not similar. Other Approaches to Entity Resolution – Partitioning: Partitioning: We can group the records, perhaps several times, into groups that are likely to contain similar records and look only within each group for pairs of similar records. CHAPTER 16 16.1 Query Compiler The query-compiler is a set of tools for the inspection of the process of query compilation. It shows how a SQL query is parsed, translated in relational algebra and optimized. QUERY COMPILER Query Compiler perform the following operations : parse the query which is represented as a parse tree. Represent parse tree as an expression tree of relational algebra. Turn relational algebra into physical query plan. QUERY COMPILER - PARSING Query Parser Preprocessor Logical Plan Generator Query rewriter logical query plan SYNTAX ANALYSIS AND PARSE TREE The job of a parse tree is: It takes text written in SQL language and convert it into a parse tree whose nodes are correspond to either. ATOMS-are keywords, constants, operators, names and parenthesis. Syntax categories : names for families of query’s subpart. RULES <SFW> ::= SELECT <SelList> FROM <FromList> WHERE <Condition> Select-List : <SelList> ::= <Attribute>,<SelList> <SelList>::= <Attribute> From-List : <FromList>::= <Relation>,<FromList> <FromList>::= <Relation> RULES Conditions: <Condition>::= <Condition> AND <Condition> <Condition>::= <Tuple> IN <Query> <Condition>::= <Attribute> = <Attribute> <Condition>::= <Attribute> LIKE <Pattern> Tuple: <Tuple>::= <Attribute> TABLES StarsIn(movieTitle, movieyear, starName) MovieStar(name, address, gender, birthdate) We want to find titles of movies that have at least one star born in 1960. QUERY SHOULD BE LIKE…. SELECT movieTitle FROM StarsIn WHERE starName I N ( SELECT name FROM Moviestar WHERE birthdate LIKE '%1960’); PARSE TREE <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> movieTitle <RelName> StarsIn <Tuple> IN <Query> <Attribute> starName ( <Query> ) <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> Name <RelName> MovieStar birthdate <Attribute> LIKE <Pattern> ‘%1960’ PREPROCESSOR It does semantic checking. Functions of preprocessor: 1. Check relations uses. 2. Check and resolves attribute uses. 3. Check types. PUSHING SELECTIONS It is, replacing the left side of one of the rules by its right side. In pushing selections we first a selection as far up the tree as it would go, and then push the selections down all possible branches. Let’s take an example: S t a r s I n ( t i t l e , year, starName) Movie(title, year, length, incolor, studioName, producerC#) Define view MoviesOf 1996 by: CREATE VIEW MoviesOfl996 AS SELECT * FROM Movie ,WHERE year = 1996; "which stars worked for which studios in 1996?“ can be given by a SQL Query: SELECT starName, studioName FROM MoviesOfl996 NATURAL JOIN StarsIn; ΠstarName,studioName O Year=1996 StarsI n Movie Logical query plan constructed from definition of a query and view Improving the query plan by moving selections up and down the tree ΠstarName,studioName O Year=1996 O Year=1996 Movie StarsI n LAWS INVOLVING PROJECTION "pushing" projections really involves introducing a new projection somewhere below an existing projection. projection keeps the number of tuples the same and only reduces the length of tuples. To describe the transformations of extended projection Consider a term E + x on the list for a projection, where E is an attribute or an expression involving attributes and constants and x is an output attribute. E.g Let R(a, b, c) and S(c, d, e) be two relations. Consider the expression x,+,,,, b+y(R w S). The input attributes of the projection are a,b, and e, and c is the only join attribute. We may apply the law for pushing projections below joins to get the equivalent expression: Πa+e->x,b->y(Πa,b,c(R) Πc,e(S)) Eliminating this projection and getting a third equivalent expression:Πa+e->x, b->y( R Πc,e(S)) LAWS ABOUT JOINS AND PRODUCTS laws that follow directly from the definition of the join: R c S= c( R * S) O O R S = ΠL( c ( R * S) ) , where C is the condition that equates each pair of attributes from R and S with the same name. and L is a list that includes one attribute from each equated pair and all the other attributes of R and S. We identify a product followed by a selection as a join of some kind. LAWS INVOLVING DUPLICATE ELIMINATION The operator δ which eliminates duplicates from a bag can be pushed through many but not all operators. In general, moving a δ down the tree reduces the size of intermediate relations and may therefore beneficial. Moreover, sometimes we can move δ to a position where it can be eliminated altogether,because it is applied to a relation that is known not to possess duplicates. δ (R)=R if R has no duplicates. Important cases of such a relation R include: a) A stored relation with a declared primary key, and b) A relation that is the result of a γ operation, since grouping creates a relation with no duplicates. Several laws that "push" δ through other operators are: δ (R*S) =δ(R) * δ(S) δ (R S)=δ(R) δ(S) δ (R c S)=δ(R) c δ(S) δ ( c (R))= c (δ(R)) We can also move the δ to either or both of the arguments of O an intersection: O δ (R ∩B S) = δ(R) ∩B S = R ∩B δ (S) = δ(R) ∩B δ (S) LAWS INVOLVING GROUPING AND AGGREGATION When we consider the operator γ, we find that the applicability of many transformations depends on the details of the aggregate operators used. Thus we cannot state laws in the generality that we used for the other operators. One exception is that a γ absorbs a δ . Precisely: δ(γL(R))=γL(R) let us call an operator γ duplicate-impervious if the only aggregations in L are MIN and/or MAX then: γ L(R) = γ L (δ(R)) provided γL is duplicate-impervious. E.g Suppose we have the relations MovieStar(name , addr , gender, birthdate) StarsIn(movieTitle, movieyear, starname) and we want to know for each year the birthdate of the youngest star to appear in a movie that year. We can express this query as: SELECT movieyear, MAX(birth date) FROM MovieStar, StarsIn WHERE name = starName GROUP BY movieyear; γ movieYear, MAX ( birthdate ) name = starName O MovieStar StarsIn Initial logical query plan for the query Some transformations that we can apply to Fig are 1. Combine the selection and product into an equijoin. 2.Generate a δ below the γ , since the γ is duplicateimpervious. 3. Generate a Π between the γ and the introduced δ to project onto movie-Year and birthdate, the only attributes relevant to the γ γ movieYear, MAX ( birthdate ) Π movieYear, birthdate δ name = starName MovieStar StarsIn Another query plan for the query γ movieYear, MAX ( birthdate ) Π movieYear, birthdate name = starName δ Π birthdate,name MovieStar δ Π movieYear,starname StarsIn third query plan for Example 16.3 • Parsing • Goal is to convert a text string containing a query into a parse tree data structure: – leaves form the text string (broken into lexical elements) – internal nodes are syntactic categories • Uses standard algorithmic techniques from compilers – given a grammar for the language (e.g., SQL), process the string and build the tree EXAMPLE: SQL QUERY SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) Assume we have a simplified grammar for SQL. Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> <Attribute> title SELECT WHERE <RelName> <Tuple> IN <Query> StarsIn <SelList> <Attribute> name FROM <Condition> <FromList> <RelName> MovieStar <Attribute> ( <Query> ) starName <SFW> WHERE <Condition> <Attribute> LIKE <Pattern> birthDate ‘%1960’ The Preprocessor • It replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query) • It does semantic checking: – are relations and views mentioned in the schema? – are attributes mentioned in the current scope? – are attribute types correct? Convert Parse Tree to Relational Algebra • The complete algorithm depends on specific grammar, which determines forms of the parse trees • Here is a flavor of the approach Conversion • Suppose there are no subqueries. • SELECT att-list FROM rel-list WHERE cond is converted into PROJatt-list(SELECTcond(PRODUCT(rel-list))), or att-list(cond( X (rel-list))) SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960'; <Query> <SFW> SELECT <SelList> FROM <FromList> <Attribute> movieTitle WHERE <Condition> <RelName> , <FromList> StarsIn <RelName> AND <Condition> <Attribute> LIKE <Pattern> MovieStar birthdate <Condition> <Attribute> = <Attribute> starName name '%1960' Equivalent Algebraic Expression Tree movieTitle starname = name AND birthdate LIKE '%1960' X StarsIn MovieStar Handling Subqueries • Recall the (equivalent) query: SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); • Use an intermediate format called twoargument selection Example: Two-Argument Selection title StarsIn <condition> <tuple> <attribute> starName IN name birthdate LIKE ‘%1960’ MovieStar Converting Two-Argument Selection • To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression • Different rules depending on the nature of the sub query • Here is shown an example for IN operator and uncorrelated query (sub query computes a relation independent of the tuple being tested) Rules for IN R C <Condition> t IN S X R S C is the condition that equates attributes in t with corresponding attributes in S Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar What if Subquery is Correlated? • Example is when subquery refers to the current tuple of the outer scope that is being tested • More complicated to deal with, since subquery cannot be translated in isolation • Need to incorporate external attributes in the translation • Some details are in textbook Improving the Logical Query Plan • There are numerous algebraic laws concerning relational algebra operations • By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently • Next we'll survey some of these laws Example: Improved Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Associative and Commutative Operations • • • • product natural join set and bag union set and bag intersection associative: (A op B) op C = A op (B op C) commutative: A op B = B op A Laws Involving Selection • Selections usually reduce the size of the relation • Usually good to do selections early, i.e., "push them down the tree" • Also can be helpful to break up a complex selection into parts Selection Splitting • C1 AND C2 (R) = C1 ( C2 (R)) • C1 OR C2 (R) = ( C1 (R)) Uset ( C2 (R)) if R is a set • C1 ( C2 (R)) = C2 ( C1 (R)) Selection and Binary Operators • Must push selection to both arguments: – C (R U S) = C (R) U C (S) • Must push to first arg, optional for 2nd: – C (R - S) = C (R) - S – C (R - S) = C (R) - C (S) • Push to at least one arg with all attributes mentioned in C: – product, natural join, theta join, intersection – e.g., C (R X S) = C (R) X S, if R has all the atts in C Pushing Selection Up the Tree • Suppose we have relations – StarsIn(title,year,starName) – Movie(title,year,len,inColor,studioName) • and a view – CREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996; • and the query – SELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn; The Straightforward Tree starName,studioName year=1996 Movie StarsIn Remember the rule C(R S) = C(R) S? The Improved Logical Query Plan starName,studioName starName,studioName starName,studioName year=1996 year=1996 year=1996 year=1996 StarsIn Movie Movie push selection up tree StarsIn Movie push selection down tree StarsIn Grouping Assoc/Comm Operators • Groups together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree • Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections) U D U A B E U F A C B D C E F PHYSICAL PLAN FOR EACH PHYSICAL PLAN SELECT An order and grouping for associative-and-commutative operations like joins, unions. An Algorithm for each operator in the logical plan. eg: whether nested loop join or hash join to be used Additional operators that are needed for the physical plan but that were not present explicitly in the logical plan. eg: scanning, sorting The way in which arguments are passed from one operator to the next. ESTIMATING SIZES OF INTERMEDIATE RELATIONS Rules for estimating the number of tuples in an intermediate relation: 1. Give accurate estimates 2. Are easy to compute 3. Are logically consistent Objective of estimation is to select best physical plan with least cost. ESTIMATING THE SIZE OF A PROJECTION The projection is different from the other operators, in that the size of the result is computable. Since a projection produces a result tuple for every argument tuple, the only change in the output size is the change in the lengths of the tuples. ESTIMATING THE SIZE OF A SELECTION(1) S A c (R ) Let , ,where A is an attribute of R and C is a constant. Then we recommend as an estimate: T(S) =T(R)/V(R,A) The rule above surely holds if all values of attribute A occur equally often in the database. ESTIMATING THE SIZE OF A SELECTION(2) S a c (R ) If ,then our estimate for T(s) is: T(S) = T(R)/3 We may use T(S)=T(R)(V(R,a) -1 )/ V(R,a) as an estimate. When the selection condition C is the And of several equalities and inequalities, we can treat the selection c(R ) as a cascade of simple selections, each of which checks for one of the conditions. ESTIMATING THE SIZE OF A SELECTION(3) A less simple, but possibly more accurate estimate of the size of S c1 OR c2(R) is to assume that C1 2 and of which satisfy C2, we would estimate the number of tuples in S as m n(1 (1 m1 / n)(1 m2 / n)) 1 m1 / n In explanation, is the fraction of tuples that do not satisfy C1, and 1 m2 / n is the fraction that do not satisfy C2. The product of these numbers is the fraction of R’s tuples that are not in S, and 1 minus this product is the fraction that are in S. ESTIMATING THE SIZE OF A JOIN two simplifying assumptions: 1. Containment of Value Sets If R and S are two relations with attribute Y and V(R,Y)<=V(S,Y) then every Y-value of R will be a Y-value of S. 2. Preservation of Value Sets Join a relation R with another relation S with attribute A in R and not in S then all distinct values of A are preserved and not lost.V(S R,A) = V(R,A) Under these assumptions, we estimate T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y)) NATURAL JOINS WITH MULTIPLE JOIN ATTRIBUTES Of the T(R),T(S) pairs of tuples from R and S, the expected number of pairs that match in both y1 and y2 is: T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2)) In general, the following rule can be used to estimate the size of a natural join when there are any number of attributes shared between the two relations. The estimate of the size of R S is computed by multiplying T(R) by T(S) and dividing by the largest of V(R,y) and V(S,y) for each attribute y that is common to R and S. JOINS OF MANY RELATIONS(1) rule for estimating the size of any join Start with the product of the number of tuples in each relation. Then, for each attribute A appearing at least twice, divide by all but the least of V(R,A)’s. We can estimate the number of values that will remain for attribute A after the join. By the preservation-of-value-sets assumption, it is the least of these V(R,A)’s. JOINS OF MANY RELATIONS(2) Based on the two assumptions-containment and preservation of value sets: No matter how we group and order the terms in a natural join of n relations, the estimation of rules, applied to each join individually, yield the same estimate for the size of the result. Moreover, this estimate is the same that we get if we apply the rule for the join of all n relations as a whole. ESTIMATING SIZES FOR OTHER OPERATIONS Union: the average of the sum and the larger. Intersection: approach1: take the average of the extremes, which is the half the smaller. approach2: intersection is an extreme case of the natural join, use the formula T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y)) ESTIMATING SIZES FOR OTHER OPERATIONS Difference: Duplicate T(R)-(1/2)*T(S) Elimination: take the smaller of (1/2)*T(R) and the product of all the V(R, )’s. ai Grouping and Aggregation: upper-bound the number of groups by a product of V(R,A)’s, here attribute A ranges over only the grouping attributes of L. An estimate is the smaller of (1/2)*T(R) and this product. 16.5 Whether selecting a logical query plan or constructing a physical query plan from a logical plan, the query optimizer needs to estimate the cost of evaluating certain expressions. We shall assume that the "cost" of evaluating an expression is approximated well by the number of disk I/O's performed. The number of disk I/O’s, in turn, is influenced by: 1. The particular logical operators chosen to implement the query, a matter decided when we choose the logical query plan. 2. The sizes of intermediate results (whose estimation we discussed in Section 16.4) 3. The physical operators used to implement logical operators. e.g.. The choice of a one-pass or two-pass join, or the choice to sort or not sort a given relation. 4. The ordering of similar operations, especially joins 5. The method of passing arguments from one physical operator to the next. OBTAINING ESTIMATES FOR SIZE PARAMETER The formulas of Section 16.4 were predicated on knowing certain important parameters, especially T(R), the number of tuples in a relation R, and V(R, a), the number of different values in the column of relation R for attribute a. A modern DBMS generally allows the user or administrator explicitly to request the gathering of statistics, such as T(R) and V(R, a). These statistics are then used in subsequent query optimizations to estimate the cost of operations. By scanning an entire relation R, it is straightforward to count the number of tuples T(R) and also to discover the number of different values V(R, a) for each attribute a. The number of blocks in which R can fit, B(R), can be estimated either by counting the actual number of blocks used (if R is clustered), or by dividing T(R) by the number of tuples per block COMPUTATION OF STATISTICS Periodic re-computation of statistics is the norm in most DBMS's, for several reasons. First, statistics tend not to change radically in a short time. Second, even somewhat inaccurate statistics are useful as long as they are applied consistently to all the plans. Third, the alternative of keeping statistics up-to-date can make the statistics themselves into a "hot-spot" in the database; because statistics are read frequently, we prefer not to update them frequently too. The recomputation of statistics might be triggered automatically after some period of time, or after some number of updates. However, a database administrator noticing, that poorperforming query plans are being selected by the query optimizer on a regular basis, might request the recomputation of statistics in an attempt to rectify the problem. Computing statistics for an entire relation R can be very expensive, particularly if we compute V(R, a) for each attribute a in the relation. One common approach is to compute approximate statistics by sampling only a fraction of the data. For example, let us suppose we want to sample a small fraction of the tuples to obtain an estimate for V(R, a). HEURISTICS FOR REDUCING THE COST OF LOGICAL QUERY PLANS One important use of cost estimates for queries or subqueries is in the application of heuristic transformations of the query. We have already observed previously how certain heuristics applied independent of cost estimates can be expected almost certainly to improve the cost of a logical query plan. However, there are other points in the query optimization process where estimating the cost both before and after a transformation will allow us to apply a transformation where it appears to reduce cost and avoid the transformation otherwise. In particular, when the preferred logical query plan is being generated, we may consider a number of optional transformations and the costs before and after. Because we are estimating the cost of a logical query plan, so we have not yet made decisions about the physical operators that will be used to implement the operators of relational algebra, our cost estimate cannot be based on disk I/Os. Rather, we estimate the sizes of all intermediate results using the techniques of Section 16.1, and their sum is our heuristic estimate for the cost of the entire logical plan. For example, Consider the initial logical query plan of as shown below, δ σa = 10 R S The statistics for the relations R and S be as follows R(a, b) T(R) = 5000 V(R, a) = 50 V(R, b) = 100 S(b, c) T(S) = 2000 V(S, a) = 200 V(S, b) = 100 To generate a final logical query plan from, we shall insist that the selection be pushed down as far as possible. However, we are not sure whether it makes sense to push the δ below the join or not. Thus, we generate from the two query plans shown in next slide. They differ in whether we have chosen to eliminate duplicates before or after the join. 250 50 δ 100 σa = 10 5000 R (a) 500 δ δ S 2000 1000 1000 0 100 σa = 10 5000 R (b) S 2000 We know how to estimate the size of the result of the selections, we divide T(R) by V(R, a) = 50. We also know how to estimate the size of the joins; we multiply the sizes of the arguments and divide by max(V(R, b), V(S, b)), which is 200. APPROACHES TO ENUMERATING PHYSICAL PLANS Let us consider the use of cost estimates in the conversion of a logical query plan to a physical query plan. The baseline approach, called exhaustive, is to consider all combinations of choices (for each of issues like order of joins, physical implementation of operators, and so on). Each possible physical plan is assigned an estimated cost, and the one with the smallest cost is selected. There are two broad approaches to exploring the space of possible physical plans: Top-down: Here, we work down the tree of the logical query plan from the root. Bottom-up: For each sub-expression of the logical-query-plan tree, we compute the costs of all possible ways to compute that sub-expression. The possibilities and costs for a sub-expression E are computed by considering the options for the subexpressions for E, and combining them in all possible ways with implementations for the root operator of E. BRANCH-AND-BOUND PLAN ENUMERATION This approach, often used in practice, begins by using heuristics to find a good physical plan for the entire logical query plan. Let the cost of this plan be C. Then as we consider other plans for sub-queries, we can eliminate any plan for a sub-query that has a cost greater than C, since that plan for the sub-query could not possibly participate in a plan for the complete query that is better than what we already know. Likewise, if we construct a plan for the complete query that has cost less than C, we replace C by the cost of this better plan in subsequent exploration of the space of physical query plans. HILL CLIMBING This approach, in which we really search for a “valley” in the space of physical plans and their costs; starts with a heuristically selected physical plan. We can then make small changes to the plan, e.g., replacing one method for an operator by another, or reordering joins by using the associative and/or commutative laws, to find "nearby" plans that have lower cost. When we find a plan such that no small modification yields a plan of lower cost, we make that plan our chosen physical query plan. DYNAMIC PROGRAMMING In this variation of the general bottom-UP strategy, we keep for each sub-expression only the plan of least cost. As we work UP the tree, we consider possible implementations of each node, assuming the best plan for each sub-expression is also used. SELINGER-STYLE OPTIMIZATION This approach improves upon the dynamic-programming approach by keeping for each sub-expression not only the plan of least cost, but certain other plans that have higher cost, yet produce a result that is sorted in an order that may be useful higher up in the expression tree. Examples of such interesting orders are when the result of the sub-expression is sorted on one of: The attribute(s) specified in a sort (r) operator at the root The grouping attribute(s) of a later group-by (γ) operator. The join attribute(s) of a later join. SIGNIFICANCE OF LEFT AND RIGHT JOIN ARGUMENTS The argument relations in joins determine the cost of the join The left argument of the join is Called the build relation Assumed to be smaller Stored in main-memory Introduction • This section focuses on critical problem in cost-based optimization: – Selecting order for natural join of three or more relations • Compared to other binary operations, joins take more time and therefore need effective optimization techniques Significance of Arguments & How it executes • The argument relations in joins determine the cost of the join • The left argument of the join is – Called the build relation – Assumed to be smaller – Stored in main-memory Significance of Arguments & How it executes • The right argument of the join is – Called the probe relation – Read a block at a time – Its tuples are matched with those of build relation • The join algorithms which distinguish between the arguments are: – One-pass join – Nested-loop join – Index join Join Trees • Order of arguments is important for joining two relations • Left argument, since stored in mainmemory, should be smaller • With two relations only two choices of join tree • With more than two relations, there are n! ways to order the arguments and therefore n! join trees, where n is the no. of relations Join Trees Types • Consider 4 relations. Different ways to join them are as follows Citation: Database System The Complete Book Chapter 16 Figure: 16.27 Join Trees Types • In fig (a) all the right children are leaves. This is a left-deep tree • In fig (c) all the left children are leaves. This is a right-deep tree • Fig (b) is a bushy tree • Considering left-deep trees is advantageous for deciding join orders Dynamic Programming to Select a Join Order and Grouping To pick an order for the join of many relations there are three choices Consider them all Consider a subset Use a heuristic to pick one Use Dynamic Programming to enumerate trees Dynamic Programming to Select a Join Order and Grouping Dynamic programming is used either to consider all or a subset Construct a table of costs based on relation size Remember only the minimum entry which will required to proceed Dynamic Programming with More Detailed Cost Functions Disadvantage of dynamic programming is that it does not involve the actual costs of the joins in the calculations Can be improved by considering Use disk’s I/O for evaluating cost When computing cost of R1 join R2, since we sum cost of R1 and R2, we must also compute estimates for there sizes A Greedy Algorithm for Selecting a Join Order It is expensive to use an exhaustive method like dynamic programming Better approach is to use a join-order heuristic for the query optimization Greedy algorithm is an example of that Make one decision at a time and never backtrack on the decisions once made Chapter 15 15.1 What is query processing A given SQL query is translated by the query processor into a low level execution plan • An execution plan is a program in a functional language: – The physical relational algebra, specific for each DBMS. • The physical relational algebra extends the relational algebra with: – Primitives to search through the internal data structures of the DBMS What is a Query Processor Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations It also executes those operations Must supply detail regarding how the query is to be executed 312 Major parts of Query processor Query Execution: The algorithms that manipulate the data of the database. Focus on the operations of extended 313 Query Processing Steps SQL Query PARSER (parsing and semantic checking as in any compiler) Parse tree (~ tree structure representing relational calculus expression) OPTIMIZER (very advanced) Execution plan (annotated relation algebra expression) EXECUTOR (execution plan interpreter) DBMS kernel Data structures Outline of Query Compilation Query compilation Parsing : A parse tree for the query is constructed Query Rewrite : The parse tree is converted to an initial query plan and transformed into logical query plan (less time) Physical Plan Generation : Logical Q Plan is converted into physical query plan by selecting algorithms and order of execution of these operator. 315 Basic Steps in Query Processing Physical-Query-Plan Operators Physical operators are implementations of the operator of relational algebra. They can also be use in non relational algebra operators like “scan” which scans tables, that is, bring each tuple of some relation into main memory 317 BASIC STEPS IN QUERY PROCESSING 2.Optimization 3.Evaluation ROUTERS 1.Parsing and translation BASIC STEPS IN QUERY PROCESSING (CONT.) ROUTERS – Parsing and translation • translate the query into its internal form. This is then translated into relational algebra. • Parser checks syntax, verifies relations – Evaluation • The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query Scanning Tables One of the basic thing we can do in a Physical query plan is to read the entire contents of a relation R. Variation of this operator involves simple predicate, read only those tuples of the relation R that satisfy the predicate. 320 Scanning Tables Basic approaches to locate the tuples of a relation R Table Scan Relation R is stored in secondary memory with its tuples arranged in blocks It is possible to get the blocks one by one Index-Scan If there is an index on any attribute of Relation R, we can use this index to get all 321 the tuples of Relation R Sorting While Scanning Tables Number of reasons to sort a relation Query could include an ORDER BY clause, requiring that a relation be sorted. Algorithms to implement relational algebra operations requires one or both arguments to be sorted relations. Physical-query-plan operator sort-scan takes a relation R, attributes on which the sort is to be made, and produces R in that sorted order 322 Computation Model for Physical Operator Physical-Plan Operator should be selected wisely which is essential for good Query Processor . For “cost” of each operator is estimated by number of disk I/O’s for an operation. The total cost of operation depends on the size of the answer, and includes the final write back cost to the total cost of the query. 323 Parameters for Measuring Costs Parameters that affect the performance of a query Buffer space availability in the main memory at the time of execution of the query Size of input and the size of the output generated The size of memory block on the disk and the size in the main memory also affects the performance 324 Parameters for Measuring Costs B: The number of blocks are needed to hold all tuples of relation R. Also denoted as B(R) T:The number of tuples in relationR. Also denoted as T(R) V: The number of distinct values that appear in a column of a relation R V(R, a)- is the number of distinct values of column for a in relation R 325 I/O Cost for Scan Operators If relation R is clustered, then the number of disk I/O for the table-scan operator is = ~B disk I/O’s If relation R is not clustered, then the number of required disk I/O generally is much higher A index on a relation R occupies many fewer than B(R) blocks That means a scan of the entire relation R which takes at least B disk I/O’s will require more I/O’s than the entire index 326 Iterators for Implementation of Physical Operators Many physical operators can be implemented as an Iterator. Three methods forming the iterator for an operation are: 1. Open( ) : This method starts the process of getting tuples It initializes any data structures needed to perform the operation 327 Iterators for Implementation of Physical Operators 2. GetNext( ): Returns the next tuple in the result If there are no more tuples to return, GetNext returns a special value NotFound 3. Close( ) : Ends the iteration after all tuples It calls Close on any arguments of the operator 328 QUERY COMPILATION Query compilation is divided into 3 major steps: Parsing, in which a parse tree representing the query and its structure is constructed. Query rewrite, in which the parse tree is converted to an initial query plan, which is an algebraic representation of the query. Physical Plan Generation, where the abstract query plan is converted into physical query plan. QUERY COMPILATION INTRODUCTION TO PHYSICAL-QUERY-PLAN OPERATORS Physical query plans are built from the operators each of which implements one step of the plan. Physical operators can be implementations of the operators of relational algebra. However they can also be operators of non-relational algebra like ‘scan’ operator used for scanning tables. SCANNING TABLES There are mainly two approaches for locating the tuples of a relation R: Table-scan Index-scan Table Scan: • In this operation, Relation R is stored in secondary memory with its tuples arranged in blocks. • It is possible to get the blocks one by one. INDEX SCAN In index scan operation, if there is an index on any attribute of relation R, then we can use this index to get all the tuples of R. Sorting while scanning patterns Reasons why we need sorting while scanning tables: Various algorithms for relational-algebra operations require one or both of their arguments to be sorted relation the query could include an ORDER BY clause. Requiring that a relation be sorted SORTING WHILE SCANNING TABLES Reasons why we need sorting while scanning tables: Various algorithms for relational-algebra operations require one or both of their arguments to be sorted relation the query could include an ORDER BY clause. Requiring that a relation be sorted • A Physical-query-plan operator sort-scan takes a relation R and a specification of the attributes on which the sort is to be made, and produces R in that sorted order. If we are to produce a relation R sorted by attribute a, and if there is a B-tree index on a, then index scan is used. If relation R is small enough to fit in main memory, then we can retrieve its tuples using a table scan. PARAMETERS FOR MEASURING COSTS Parameters that mainly affect the performance of a query are: The size of memory block on the disk and the size in the main memory affects the performance of a query. Buffer space availability in the main memory at the time of execution of the query. Size of input and the size of the output generated I/O COST FOR SCAN OPERATORS This are the number of disk I/O’s needed for each of the scan operators. If a relation R is clustered, then the number of disk I/O’s is approximately B where B is the number of blocks where R is stored. If R is clustered but requires a two phase multi way merge sort then the total number of disk i/o required will be 3B. If R is not clustered, then the number of required disk I/0's is generally much higher. ITERATORS FOR IMPLEMENTATION OF PHYSICAL OPERATORS The three functions for implementation of physical operators are: Open GetNext Close This function starts the process of getting tuples. It initializes any data structures needed to perform the operation GetNext This function returns the next tuple in the result and adjusts the data structures as necessary to allow subsequent tuples to be obtained. If there are no more tuples to be returned, GetNext returns a special value NotFound CLOSE: This function ends the iteration after all tuples. It calls Close on any argument of the operator. 15.3 Introduction to Nested-Loop Joins Used for relations of any side. Not necessary that relation fits in main memory Uses “One-and-a-half” pass method in which for each variation: One argument read just once. Other argument read repeatedly. Two kinds: Tuple-Based Nested Loop Join Block-Based Nested Loop Join ADVANTAGES OF NESTEDLOOP JOIN Fits in the iterator framework. Allows us to avoid storing intermediate relation on disk. TUPLE-BASED NESTED-LOOP JOIN Simplest variation of the nested-loop join Loop ranges over individual tuples TUPLE-BASED NESTED-LOOP JOIN Algorithm to compute the Join R(X,Y) | | S(Y,Z) FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to make tuple t THEN output t R and S are two Relations with r and s as tuples. carelessness in buffering of blocks causes the use of T(R)T(S) disk I/O’s IMPROVEMENT & MODIFICATION To decrease the cost Method 1: Use algorithm for Index-Based joins We find tuple of R that matches given tuple of S We need not to read entire relation R Method 2: Use algorithm for Block-Based joins Tuples of R & S are divided into blocks Uses enough memory to store blocks in order to reduce the number of disk I/O’s. AN ITERATOR FOR TUPLE-BASED NESTED -L OOP JOIN Open0 C R.Open() S . Open () GetNextO { REPEAT C r := R.GetNext(); IF (r = NotFound) C /* R is exhausted for the current s */ R.Close(); s := S.GetNext(); IF (s = NotFound) RETURN NotFound; /* both R and S are exhausted */ R.Open0 ; r := R.GetNext(); UNTIL(r and s join) ; RETURN the join of r and s; Close0 ( R. Close () ; S. Close () ; BLOCK-BASED NESTED-LOOP JOIN ALGORITHM Access to arguments is organized by block. While reading tuples of inner relation we use less number of I/O’s disk. Using enough space in main memory to store tuples of relation of the outer loop. Allows to join each tuple of the inner relation with as many tuples as possible. FOR each chunk of M-1 blocks of S DO BEGIN read these blocks into main-memory buffers; organize their tuples into a search structure whose search key is the common attributes of R and S; FOR each block b of R DO BEGIN read b into main memory; FOR each tuple t of b DO BEGIN find the tuples of S in main memory that join with t ; output the join of t with each of these tuples; END ; END ; END ; BLOCK-BASED NESTED-LOOP JOIN ALGORITHM ALGORITHM: FOR each chunk of M-1 blocks of S DO FOR each block b of R DO FOR each tuple t of b DO find the tuples of S in memory that join with t output the join of t with each of these tuples BLOCK-BASED NESTED-LOOP JOIN ALGORITHM Assumptions: B(S) ≤ B(R) B(S) > M This means that the neither relation fits in the entire main memory. ANALYSIS OF NESTED-LOOP JOIN Number of disk I/O’s: 1 [B(S)/(M- )]*(M-1 +B(R)) or 1 B(S) + [B(S)B(R)/(M- )] or approximately B(S)*B(R)/M INFORMATION Information Integration allows database or other distributed information to work together. Three most common approaches: Federated Database DataWareHousin Mediators FEDERATED DATABASE SYSTEM Sources are independent, but one source can call on others to supply information. One-to-One connection between the all pairs of databases DB 1 DB 2 DB 3 DB 4 Dealer 1 NeededCars(mode1, color, autotrans) Dealer 2 Autos(seria1, model, color) Options(seria1, option) Dealer 1 to Dealer 2 f or ( e a ch t u p l e (:m, : c , :a) in neededCars ) i f ( : a = TRUE) { /* automatic transmission wanted */ SELECT s e r i a l FROM Autos, Options WHERE Autos.seria1 = Options.seria1 AND Options.option = 'autoTrans' AND Autos.mode1 = :m AND Autos.color = :c; } e l s e { /* automatic transmission not wanted */ SELECT serial FROM Autos WHERE Autos.mode1 = :m AND Autos.color = :c AND NOT EXISTS ( SELECT * FROM Options WHERE s e r i a l = Autos.seria1 AND option = 'autoTrans' }; } } Dealer 3 Cars(serialN0, model, color, autoTrans, ...) INFORMATION Copies sources of data from several sources are stored in a single database. User Query Result Ware House Combine r Extractor 1 Extractor 2 Sour ce 1 Sour ce 2 Dealer 1 Cars(serialN0, model, color, autoTrans, cdPlayer, ... ) Dealer 2 Autos(seria1, model, color) Opt ions ( s e r i a l , option) WareHouse AutosWhse(seria1N0, model, color, autoTrans, dealer) MEDIATORS: It is a software component that supports a virtual database. It stores no data of its own. Mediato User Query r Result Wrapper Wrapper Sour ce 1 Sour ce 2 EXTRACTOR FOR TRANSLATING DEALER-2 DATA TO THE WAREHOUSE INSERT INTO AutosWhse(serialNo, model, color,autoTrans , dealer) SELECT s e r i a l , model, color, ' y e s ' , 'dealer2' FROM Autos, Options WHERE Autos.seria1 = Options.seria1 AND option = 'autoTrans'; INSERT INTO AutosWhse(serialNo, model, color,autoTrans , dealer) SELECT s e r i a l , model, color, 'no', 'dealer2‘ FROM Autos WHERE NOT EXISTS ( SELECT * FROM Options WHERE s e r i a l = Autos.seria1 AND option = 'autoTrans' ); 15.8 Intro Algorithms using more than two passes. Multi-pass Sort-based Algorithms Performance of Multipass, Sort-Based Algorithms Multipass Hash-Based Algorithms Conclusion Reason that we use more than two passes: Two passes are usually enough, however, for the largest relation, we use as many passes as necessary. Multi-pass Sort-based Algorithms Suppose we have M main-memory buffers available to sort a relation R, which we assume is stored clustered. Then we do the following: BASIS: If R fits in M blocks (i.e., B(R)<=M) 1. Read R into main memory. 2. Sort it using any main-memory sorting algorithm. 3. Write the sorted relation to disk. INDUCTION: If R does not fit into main memory. 1. Partition the blocks holding R into M groups, which we shall call R1, R2, R3… 2. Recursively sort Ri for each i=1,2,3…M. 3. Merge the M sorted sublists. If we are not merely sorting R, but performing a unary operation such as δ or γ on R. We can modify the above so that at the final merge we perform the operation on the tuples at the front of the sorted sublists. That is: For a δ, output one copy of each distinct tuple, and skip over copies of the tuple. For a γ, sort on the grouping attributes only, and combine the tuples with a given value of these grouping attributes. Conclusion The two pass algorithms based on sorting or hashing have natural recursive analogs that take three or more passes and will work for larger amounts of data. Performance of Multipass, Sort-Based Algorithms BASIS: If k = 1, i.e., one pass is allowed, then we must have B(R) < M. Put another way, s(M, 1) = Af. INDUCTION: Suppose k > 1. Then we partition R into 1M pieces, each of which must be sortable in k - 1 passes. If B(R) = s(M, k), then s(M, k)/:l17 which is the size of each of the M pieces of R, cannot exceed s(M, k - 1). That is: s(M, k) = Ms(M, k - 1) MULTIPASS HASH-BASED ALGORITHMS BASIS: For a unary operation, if the relation fits in hl buffers, read it into memory and perfor111 the operation. For a binary operation, if either relation fits in ,11 - I buffers, perform the operation by reading this relation into main memory and then read the second relation, one block at a time, into the Mth buffer. INDUCTION: If no relation fits in main memory, then hash each relation into A 1 -1 buckets, as discussed in Section 15.5.1. Recursively perform the operation on each bucket or corresponding pair of buckets, and accumulate the output from each bucket or pair.