Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Cost Model
and
Estimating Result Sizes
‫מודל המחיר‬
‫‪Cost Model‬‬
‫• בהרצאה הראנו איך לחשב את המחיר של כל‬
‫שיטה )‪(join‬‬
‫• כדי לעשות זאת צריך לדעת את גודל היחסים‪,‬‬
‫שחלקם מתקבלים כתוצאות ביניים‬
‫• לפיכך‪ ,‬יש צורך לחשב את הגודל של תוצאות ביניים‬
‫• עכשיו נסביר איך מעריכים את גודל התוצאה‬
‫בחירת תוכנית לחישוב‬
‫צירוף של שלושה יחסים‬
‫• רוצים לחשב צירוף של שלושת היחסים‪ Sailors, Reserves :‬ו‪-‬‬
‫‪Boats‬‬
‫• שתי האפשרויות (תוך התעלמות מסדר היחסים בפעולת הצירוף‬
‫הראשונה) הנן‪:‬‬
‫‪(Sailors  Reserves)  Boats‬‬
‫)‪Sailors  (Reserves  Boats‬‬
‫• ההחלטה מהי התוכנית הזולה יותר תלויה בין היתר בשאלה איזה‬
‫תוצאת ביניים הנה קטנה יותר‬
‫אנליזה של גודל התוצאות‬
‫• צריך להעריך את גודל התוצאה של הצירוף‬
‫)‪ (Sailors  Reserves‬לעומת גודל‬
‫התוצאה של הצירוף ‪(Reserves ‬‬
‫)‪Boats‬‬
‫• ה‪ DBMS -‬שומר סטטיסטיקות לגבי היחסים‬
‫והאינדקסים‬
Statistics Maintained by DBMS
• Cardinality: Number of tuples NTuples(R) in each
relation R
• Size: Number of pages NPages(R) in each relation
R
• Index Cardinality: Number of distinct key values
NKeys(I) for each index I
• Index Size: Number of pages INPages(I) in each
index I
• Index Height: Number of non-leaf levels
IHeight(I) in each B+ Tree index I
• Index Range: The minimum value ILow(I) and
maximum value IHigh(I) for each index I
Note
• The statistics are updated periodically
(not every time the underlying relations
are modified).
• We cannot use the cardinality for
computing
select count(*)
from R
Estimating Result Sizes
• Consider SELECT attribute-list
FROM relation-list
WHERE term1 and ... and termn
• The maximum number of tuples is the product
of the cardinalities of the relations in the
FROM clause
• The WHERE clause is associating a reduction
factor with each term. It reflects the impact
of the term in reducing result size.
Result Size
• Estimated result size:
maximum size
X
the product of the reduction factors
Assumptions
• There is an index I1 on R.Y and index I2
on S.Y
• Containment of value sets: if
NKeys(I1)<NKeys(I2) for attribute Y,
then every Y-value of R will be a Y-value
of S
Estimating Reduction Factors
• column = value: 1/NKeys(I)
– There is an index I on column.
– This assumes a uniform distribution.
– Otherwise, use 1/10.
• column1 = column2:
1/Max(NKeys(I1),NKeys(I2))
– There is an index I1 on column1 and an index I2 on
column2.
– Containment of value sets assumption
– If only one column has an index, we use it to estimate
the value.
– Otherwise, use 1/10.
Estimating Reduction Factors
• column > value: (High(I)-value)/(High(I)Low(I)) if there is an index I on column.
Example
Reserves (sid, agent), Sailors(sid, rating)
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid and S.rating > 3 and
R.agent = ‘Joe’
• Cardinality(R) = 100,000
• Cardinality(S) = 40,000
• NKeys(Index on R.agent) = 100
• High(Index on Rating) = 10, Low = 0
Example (cont.)
• Maximum cardinality: 100,000 * 40,000
• Reduction factor of R.sid = S.sid: 1/40,000
– sid is a primary key of S
• Reduction factor of S.rating > 3: (10–3)/(10-0) =
7/10
• Reduction factor of R.agent = ‘Joe’: 1/100
• Total Estimated size: 700
Database Tuning
Database Tuning
• Problem: Make database run efficiently
• 80/20 Rule: 80% of the time, the
database is running 20% of the queries
– find what is taking all the time, and tune
these queries
Solutions
• Indexing
– this can sometimes degrade performance.
why?
• Tuning queries
• Reorganization of tables; perhaps
"denormalization"
• Changes in physical data storage
Denormalization
• Suppose you have tables:
– emp(eid, ename, salary, did)
– dept(did, budget, address, manager)
• Suppose you often ask queries which require
finding the manager of an employee. You might
consider changing the tables to:
– emp(eid, ename, salary, did, manager)
– dept(did, budget, address, manager)
- in emp, there is an fd did -> manager. It is not 3NF!
Denormalization (cont’d)
• How will you ensure that the redundancy
does not introduce errors into the
database?
Creating Indexes Using Oracle
Index
• Map between
– a key of a row
– the location of the data on the row
• Oracle has two kinds of indexes
– B+ tree
– Bitmap
• Sorted
B+ tree
Root
13
2*
3*
5*
7*
14* 16*
17
24
19* 20* 22*
30
24* 27* 29*
33* 34* 38* 39*
Creating an Index
• Syntax:
create [bitmap] [unique] index index on
table(column [,column] . . .)
Unique Indexes
create unique index rating_bit on Sailors(rating);
• Create an index that will guarantee the
uniqueness of the key. Fail if any duplicate
already exists.
• When you create a table with a
– primary key constraint or
– unique constraint
a "unique" index is created automatically
Bitmap Indexes
• Appropriate for columns that may have very few
possible values
• For each value c that appears in the column, a
vector v of bits is created, with a 1 in v[i] if the
i-th row has the value c
– Vector length = number of rows
• Oracle can automatically convert bitmap entries
to RowIDs during query processing
Bitmap Indexes: Example
Sid
Sname
age
rating
12
Jim
55
3
13
John
46
7
14
Jane
46
10
15
Sam
37
3
create bitmap index rating_bit on Sailors(rating);
• Corresponding bitmaps:
– 3: <1 0 0 1>
– 7: <0 1 0 0>
– 10: <0 0 1 0>
When to Create an Index
• Large tables, on columns that are likely to
appear in where clauses as a simple
equality
• where s.sname = ‘John’ and s.age = 50
• where s.age = r.age
Function-Based Indexes
• You can't use an index on sname for the
following query:
select *
from Sailors
where UPPER(sname) = 'SAM';
• You can create a function-based index to speed
up the query:
create index upp_sname on Sailors(UPPER(sname));
Index-Organized Tables
• An index organized table keeps its data sorted
by the primary key
• Rows do not have RowIDs
• They store their data as if they were an index
create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
organization index;
Index-Organized Tables (2)
• What advantages does this have?
– Enforce uniqueness: primary key
– Improve performance
• What disadvantages?
– expensive to add column, dynamic data
• When to use?
– where clause on the primary key
– static data
Clustering Tables Together
• You can ask Oracle to store several tables close
together on the disk
• This is useful if you usually join these tables
together
• Cluster: area in the disk where the rows of the
tables are stored
• Cluster key: the columns by which the tables
are usually joined in a query
Clustering Tables Together:
Syntax
• create cluster sailor_reserves (X number);
– Create a cluster with nothing in it
• create table Sailors(
sid number primary key,
sname varchar2(30),
age number,
rating number)
cluster sailor_reserves(sid);
–
create the table in the cluster
Clustering Tables Together:
Syntax (cont.)
• create index sailor_reserves_index on cluster
sailor_reserves
– Create an index on the cluster
• create table Reserves(
sid number,
bid number,
day date,
primary key(sid, bid, day) )
cluster sailor_reserves(sid);
– A second table is added to the cluster
Sailors
Reserves
sid
sname
rating
age
sid
bid
day
22
Dustin
7
45.0
22
102
7/7/97
31
Lubber
8
55.5
22
101
10/10/96
58
Rusty
10
35.0
58
103
11/12/96
Stored
sid
sname
rating age
bid day
22
Dustin
7
102 7/7/97
45.0
101 10/10/96
31
Lubber
8
55.5
58
Rusty
10
35.0
103 11/12/96
The Oracle Optimizer
Types of Optimizers
• There are different modes for the optimizer
ALTER SESSION SET optimizer_mode =
{choose|rule|first_rows(_n)|all_rows}
• RULE: Rule-based optimizer (RBO)
– deprecated
• CHOOSE: Cost-based optimizer (CBO); picks a
plan based on statistics (e.g. number of rows in a
table, number of distinct keys in an index)
– Need to analyze the data in the database using
analyze command
Types of Optimizers
• ALL_ROWS: execute the query so that all of
the rows are returned as quickly as possible
– Merge join
• FIRST_ROWS(n): execute the query so that all
of the first n rows are returned as quickly as
possible
– Block nested loop join
Analyzing the Data
analyze table | index
<table_name> | <index_name>
compute statistics |
estimate statistics [sample <integer>
rows | percent] |
delete statistics;
analyze table Sailors estimate statistics sample
25 percent;
Viewing the Execution Plan
(Option 1)
• You need a PLAN_TABLE table. So, the first
time that you want to see execution plans, run
the command:
@$ORACLE_HOME/rdbms/admin/utlxplan.sql
• Set autotrace on to see all plans
– Display the execution path for each query, after
being executed
Viewing the Execution Plan
(Option 2)
• Another
option:
explain
plan
set statement_id=‘<name>’
for <statement>
explain plan
set statement_id='test'
for
SELECT *
FROM Sailors S
WHERE sname='Joe';
Select Plan_Table
Operations that Access Tables
• TABLE ACCESS FULL: sequential table scan
– Oracle optimizes by reading multiple blocks
– Used whenever there is no where clause on a query
select * from Sailors
• TABLE ACCESS BY ROWID: access rows by
their RowID values.
– How do you get the rowid? From an index!
select * from Sailors where sid > 10
Types of Indexes
• Unique: each row of the indexed table
contains a unique value for the indexed
column
• Nonunique: the row’s indexed values can
repeat
Operations that Use Indexes
• INDEX UNIQUE SCAN: Access of an
index that is defined to be unique
• INDEX RANGE SCAN: Access of an index
that is not unique or access of a unique
index for a range of values
When are Indexes Used/Not Used?
• If you set an indexed column equal to a value, e.g.,
sname = 'Jim'
• If you specify a range of values for an indexed
column, e.g., sname like 'J%'
– sname like '%m': will not use an index
– UPPER(sname) like 'J%' : will not use an index
– sname is null: will not use an index, since null values are
not stored in the index
– sname is not null: will not use an index, since every value
in the index would have to be accessed
When are Indexes Used? (cont)
• 2*age = 20: Index on age will not be used. Index
on 2*age will be used.
• sname != 'Jim': Index will not be used.
• MIN and MAX functions: Index will be used
• Equality of a column in a leading column of a
multicolumn index. For example, suppose we have
a multicolumn index on (sid, bid, day)
– sid = 12: Can use the index
– bid = 101: Cannot use the index
When are Indexes Used?
(cont)
• If the index is selective
– A small number of records are associated
with each distinct column value
Hints
Hints
• You can give the optimizer hints about
how to perform query evaluation
• Hints are written in /*+ */ right after
the select
• Note: These are only hints. The oracle
optimizer can choose to ignore your hints
Hints
• FULL hint: tell the optimizer to perform a
TABLE ACCESS FULL operation on the
specified table
• ROWID hint: tell the optimizer to perform a
TABLE ACCESS BY ROWID operation on the
specified table
• INDEX hint: tells the optimizer to use an indexbased scan on the specified table
Examples
Select /*+ FULL (sailors) */ sid
From sailors
Where sname=‘Joe’;
Select /*+ INDEX (sailors) */ sid
From sailors
Where sname=‘Joe’;
Select /*INDEX (sailors s_ind) */ sid
From sailors S, reserves R
Where S.sid=R.sid AND sname=‘Joe’;
Related documents
תרגול 9
תרגול 9
שאלה מספר 16-4
שאלה מספר 16-4
ב- SQL 2014
ב- SQL 2014