Download 124_Summarization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Consistency model wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Serializability wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Relational algebra wikipedia , lookup

Transcript
CS 257
Chapter – 15.9 Summary of Query Execution
Database Systems: The Complete Book
Krishna Vellanki
124
Introduction
What is Query Processor?
◦ Group of components of a DBMS that converts a user
queries and data-modification commands into a
sequence of database operations
◦ It also executes those operations
◦ Must supply detail regarding how the query is to be
executed
Building Blocks of Query processing
Query Execution:
 The algorithms that
manipulate the data of
the database.

Focus on the
operations of extended
relational algebra.
3
Outline of Query Compilation
Query compilation
 Parsing: A parse tree for the
query is constructed
 Query Rewrite: The parse tree
is converted to an initial query
plan and transformed into
logical query plan (less time)
 Physical Plan Generation:
Logical Q Plan is converted into
physical query plan by selecting
algorithms and order of execution of
these operator.
4
Scanning Tables
One of the basic thing we can do in a Physical query plan is to
read the entire contents of a relation R.
 Variation of this operator involves simple predicate, read only
those tuples of the relation R that satisfy the predicate.
 Basic approaches to locate the tuples of a relation R

 Table Scan
 Relation R is stored in secondary memory with its tuples arranged
in blocks
 It is possible to get the blocks one by one
 Index-Scan
 If there is an index on any attribute of Relation R, we can use this
index to get all the tuples of Relation R
5
Sorting While Scanning Tables
Number of reasons to sort a relation
Query could include an ORDER BY clause, requiring
that a relation be sorted.
Algorithms to implement relational algebra operations
requires one or both arguments to be sorted relations.
Physical-query-plan operator sort-scan takes a
relation R, attributes on which the sort is to be made,
and produces R in that sorted order
6
Parameters for Measuring Costs

Parameters that affect the performance of a query
 Buffer space availability in the main memory at the time of execution of
the query
 Size of input and the size of the output generated
 The size of memory block on the disk and the size in the main memory
also affects the performance

B: The number of blocks are needed to hold all tuples of relation R.
Also denoted as B(R).

T is the number of tuples in relation R, also denoted as T(R).

V: The number of distinct values that appear in a column of a relation R

V(R, a)- is the number of distinct values of column for a in relation R
7
One-Pass Algorithms for Database
Operations
The choice of an algorithm for each operator is an essential
part of the process of transforming a logical query plan into
a physical query plan.
 Main classes of Algorithms:
◦ Sorting-based methods
◦ Hash-based methods
◦ Index-based methods
 Division based on degree difficulty and cost:
◦ 1-pass algorithms
◦ 2-pass algorithms
◦ 3 or more pass algorithms
One-Pass Algorithm Methods
1.
One-Pass Algorithms for Tuple-at-a-Time Operations:
selection and projection
2.
One-Pass Algorithms for Unary, fill-Relation Operations:
Duplicate Elimination and Grouping
3.
One-Pass Algorithms for Unary, fill-Relation Operations:
Binary operations including Union, Intersection,
Difference, Product and Join
9
Nested Loop Joins
 Used for relations of any side.
 Not necessary that relation fits in main memory
 Uses “One-and-a-half” pass method in which for
each variation:
 One argument read just once.
 Other argument read repeatedly.
 Two kinds:
 Tuple-Based Nested Loop Join
 Block-Based Nested Loop Join
Improvement & Modification
To decrease the cost
 Method 1: Use algorithm for Index-Based joins
 We find tuple of R that matches given tuple of S
 We need not to read entire relation R
 Method 2: Use algorithm for Block-Based joins
 Tuples of R & S are divided into blocks
 Uses enough memory to store blocks in order to reduce
the number of disk I/O’s.
Physically Unrealizable Behaviors
Read too Late
Transaction T tries to read too late
Write too Late
Transaction T tries to write too late
Problem with dirty data
T could perform a dirty read if it is reads X
A write is cancelled because of a write with
a later timestamp, but the writer then aborts
Timestamps Vs Locks
Timestamps
Locks
Superior if
Superior in high-conflict
• most transactions are situations
read-only
• rare that concurrent
transactions will read or
write the same element
In high-conflict situations,
Frequently delay transactions
rollback will be frequent,
as they wait for locks
introducing more delays than a
locking system
Two passed Algorithm based on
hashing
Hashing is done if the data is too big to store in main memory buffers.
◦ Hash all the tuples of the argument(s) using an appropriate hash
key.
◦ For all the common operations, there is a way to select the hash
key so all the tuples that need to be considered together when
we perform the operation have the same hash value.
◦ This reduces the size of the operand(s) by a factor equal to the
number of buckets.
Steps to be followed for a Two passed
Algorithm based on hashing
• Duplicate Elimination
• Grouping and Aggregation
• Union, Intersection, and Difference
• Hash-Join Algorithm
Sort based Vs Hash based

For binary operations, hash-based only limits size to min
of arguments, not sum

Sort-based can produce output in sorted order, which can
be helpful

Hash-based depends on buckets being of equal size

Sort-based algorithms can experience reduced rotational
latency or seek time
15.6 Index based Algorithms




Clustered Relation: Tuples are packed into roughly
as few blocks as can possibly hold those tuples
Clustering indexes: Indexes on attributes that all the
tuples with a fixed value for the search key of this
index appear on roughly as few blocks as can hold
them
A relation that isn’t clustered cannot have a
clustering index
A clustered relation can have nonclustering indexes
Thank You..!!