Download 124_Summarization

CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124 Introduction What is Query Processor? ◦ Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations ◦ It also executes those operations ◦ Must supply detail regarding how the query is to be executed Building Blocks of Query processing Query Execution:  The algorithms that manipulate the data of the database.  Focus on the operations of extended relational algebra. 3 Outline of Query Compilation Query compilation  Parsing: A parse tree for the query is constructed  Query Rewrite: The parse tree is converted to an initial query plan and transformed into logical query plan (less time)  Physical Plan Generation: Logical Q Plan is converted into physical query plan by selecting algorithms and order of execution of these operator. 4 Scanning Tables One of the basic thing we can do in a Physical query plan is to read the entire contents of a relation R.  Variation of this operator involves simple predicate, read only those tuples of the relation R that satisfy the predicate.  Basic approaches to locate the tuples of a relation R   Table Scan  Relation R is stored in secondary memory with its tuples arranged in blocks  It is possible to get the blocks one by one  Index-Scan  If there is an index on any attribute of Relation R, we can use this index to get all the tuples of Relation R 5 Sorting While Scanning Tables Number of reasons to sort a relation Query could include an ORDER BY clause, requiring that a relation be sorted. Algorithms to implement relational algebra operations requires one or both arguments to be sorted relations. Physical-query-plan operator sort-scan takes a relation R, attributes on which the sort is to be made, and produces R in that sorted order 6 Parameters for Measuring Costs  Parameters that affect the performance of a query  Buffer space availability in the main memory at the time of execution of the query  Size of input and the size of the output generated  The size of memory block on the disk and the size in the main memory also affects the performance  B: The number of blocks are needed to hold all tuples of relation R. Also denoted as B(R).  T is the number of tuples in relation R, also denoted as T(R).  V: The number of distinct values that appear in a column of a relation R  V(R, a)- is the number of distinct values of column for a in relation R 7 One-Pass Algorithms for Database Operations The choice of an algorithm for each operator is an essential part of the process of transforming a logical query plan into a physical query plan.  Main classes of Algorithms: ◦ Sorting-based methods ◦ Hash-based methods ◦ Index-based methods  Division based on degree difficulty and cost: ◦ 1-pass algorithms ◦ 2-pass algorithms ◦ 3 or more pass algorithms One-Pass Algorithm Methods 1. One-Pass Algorithms for Tuple-at-a-Time Operations: selection and projection 2. One-Pass Algorithms for Unary, fill-Relation Operations: Duplicate Elimination and Grouping 3. One-Pass Algorithms for Unary, fill-Relation Operations: Binary operations including Union, Intersection, Difference, Product and Join 9 Nested Loop Joins  Used for relations of any side.  Not necessary that relation fits in main memory  Uses “One-and-a-half” pass method in which for each variation:  One argument read just once.  Other argument read repeatedly.  Two kinds:  Tuple-Based Nested Loop Join  Block-Based Nested Loop Join Improvement & Modification To decrease the cost  Method 1: Use algorithm for Index-Based joins  We find tuple of R that matches given tuple of S  We need not to read entire relation R  Method 2: Use algorithm for Block-Based joins  Tuples of R & S are divided into blocks  Uses enough memory to store blocks in order to reduce the number of disk I/O’s. Physically Unrealizable Behaviors Read too Late Transaction T tries to read too late Write too Late Transaction T tries to write too late Problem with dirty data T could perform a dirty read if it is reads X A write is cancelled because of a write with a later timestamp, but the writer then aborts Timestamps Vs Locks Timestamps Locks Superior if Superior in high-conflict • most transactions are situations read-only • rare that concurrent transactions will read or write the same element In high-conflict situations, Frequently delay transactions rollback will be frequent, as they wait for locks introducing more delays than a locking system Two passed Algorithm based on hashing Hashing is done if the data is too big to store in main memory buffers. ◦ Hash all the tuples of the argument(s) using an appropriate hash key. ◦ For all the common operations, there is a way to select the hash key so all the tuples that need to be considered together when we perform the operation have the same hash value. ◦ This reduces the size of the operand(s) by a factor equal to the number of buckets. Steps to be followed for a Two passed Algorithm based on hashing • Duplicate Elimination • Grouping and Aggregation • Union, Intersection, and Difference • Hash-Join Algorithm Sort based Vs Hash based  For binary operations, hash-based only limits size to min of arguments, not sum  Sort-based can produce output in sorted order, which can be helpful  Hash-based depends on buckets being of equal size  Sort-based algorithms can experience reduced rotational latency or seek time 15.6 Index based Algorithms     Clustered Relation: Tuples are packed into roughly as few blocks as can possibly hold those tuples Clustering indexes: Indexes on attributes that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them A relation that isn’t clustered cannot have a clustering index A clustered relation can have nonclustering indexes Thank You..!!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 124_Summarization