Download Index Tuning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ease of doing business index wikipedia , lookup

Transcript
Index Tuning
AOBD07/08
H. Galhardas
Index
An index is a data structure that supports
efficient access to data
Condition
on
attribute
value
Set of
Records
index
(search key)
AOBD2007/08
H. Galhardas
Matching
records
Performance Issues






Type of Query
Index Data Structure
Organization of data on disk
Index Overhead
Data Distribution
Covering
AOBD2007/08
H. Galhardas
Types of Queries
1.
Point Query
3.
SELECT balance
FROM accounts
WHERE number = 1023;
2.
Multipoint Query
SELECT number
FROM accounts
WHERE balance > 10000;
4.
SELECT balance
FROM accounts
WHERE branchnum = 100;
AOBD2007/08
Range Query
Prefix Match Query
SELECT *
FROM employees
WHERE name = ‘Jensen’
and firstname = ‘Carl’
and age < 30;
H. Galhardas
Types of Queries
5.
Extremal Query
SELECT *
FROM accounts
WHERE balance =
max(select balance from
accounts)
5.
7.
SELECT branchnum,
avg(balance)
FROM accounts
GROUP BY branchnum;
8.
Join Query
SELECT distinct
branch.adresse
FROM accounts, branch
WHERE
accounts.branchnum =
branch.number
and accounts.balance >
10000;
Ordering Query
SELECT *
FROM accounts
ORDER BY balance;
AOBD2007/08
Grouping Query
H. Galhardas
Search Keys

A (search) key is a sequence of attributes.
create index i1 on accounts(branchnum, balance);

Types of keys


Sequential: the value of the key is monotonic with
the insertion order (e.g., counter or timestamp)
Non sequential: the value of the key is unrelated
to the insertion order (e.g., social security number)
AOBD2007/08
H. Galhardas
Data Structures


Most index data structures can be viewed as trees.
In general, the root of this tree will always be in main
memory, while the leaves will be located on disk.


The performance of a data structure depends on the
number of nodes in the average path from the root to the
leaf.
Data structure with high fan-out (maximum number of
children of an internal node) are thus preferred.
AOBD2007/08
H. Galhardas
B+-Tree

A B+-Tree is a balanced tree whose leaves
contain a sequence of key-pointer pairs.
96
75 83
33 48 69
AOBD2007/08
75 80 81
107
83 92 95
H. Galhardas
96 98 103
107 110 120
B+-Tree Performance

Key length influences fanout


Choose small key when creating an index
Key compression


Prefix compression (Oracle 8, MySQL): only store that part of
the key that is needed to distinguish it from its neighbors: Smi,
Smo, Smy for Smith, Smoot, Smythe.
Front compression (Oracle 5): adjacent keys have their front
portion factored out: Smi, (2)o, (2)y. There are problems with
this approach:


AOBD2007/08
Processor overhead for maintenance
Locking Smoot requires locking Smith too.
H. Galhardas
Hash Index

A hash index stores key-value pairs based on
a pseudo-randomizing function called a hash
function.
values
Hashed key
key
2341
Hash
function
0
1
R1 R5
R3 R6 R9
n
AOBD2007/08
H. Galhardas
R14 R17 R21
R25
The length of
these chains impacts
performance
Hash Index Performance


The best for answering point queries, provided there
are no overflow chains
Good for multipoint queries


Must be reorganized (drop/add or use reorganize
function) if there is a significant amount of overflow
chaining


Useless for range, prefix or extremal queries
Avoiding overflow may require underutilize the hash space
Size of hash structure is not related to the size of a
key, because hash functions return keys to locations
or page identifiers

Hash functions take longer to execute on a long key
AOBD2007/08
H. Galhardas
Clustered / Non clustered index

Clustered index
(primary index)


A clustered index on
attribute X co-locates
records whose X
values are near to one
another.
Records
AOBD2007/08
Non-clustered index
(secondary index)


A non clustered index
does not constrain table
organization.
There might be several
non-clustered indexes per
table.
Records
H. Galhardas
Dense / Sparse Index

Sparse index


Pointers are associated to
pages
Dense index


P1
AOBD2007/08
P2
Pointers are
associated to records
Non clustered indexes
are dense
record
Pi
record
H. Galhardas
record
Index Implementations in some major
DBMS

SQL Server





B+-Tree data structure
Clustered indexes are
sparse
Indexes maintained as
updates/insertions/deletes
are performed





B+-Tree data structure,
spatial extender for R-tree
Clustered indexes are
dense
Explicit command for index
reorganization
AOBD2007/08
B+-tree, hash, bitmap,
spatial extender for R-Tree
No clustered index until 10g

DB2

Oracle

H. Galhardas
Index organized table
(unique/clustered)
Clusters used when
creating tables.
MySQL


B+-Tree, R-Tree (geometry
and pairs of integers)
Indexes maintained as
updates/insertions/deletes
are performed
Index Tuning Knobs





Index data structure
Search key
Size of key
Clustered/Non-clustered/No index
Covering
AOBD2007/08
H. Galhardas
Types of Queries
1.
Point Query
3.
SELECT balance
FROM accounts
WHERE number = 1023;
2.
Multipoint Query
SELECT number
FROM accounts
WHERE balance > 10000;
4.
SELECT balance
FROM accounts
WHERE branchnum = 100;
AOBD2007/08
Range Query
Prefix Match Query
SELECT *
FROM employees
WHERE name = ‘Jensen’
and firstname = ‘Carl’
and age < 30;
H. Galhardas
Types of Queries
5.
Extremal Query
SELECT *
FROM accounts
WHERE balance =
max(select balance from
accounts)
5.
7.
SELECT branchnum,
avg(balance)
FROM accounts
GROUP BY branchnum;
8.
Join Query
SELECT distinct
branch.adresse
FROM accounts, branch
WHERE
accounts.branchnum =
branch.number
and accounts.balance >
10000;
Ordering Query
SELECT *
FROM accounts
ORDER BY balance;
AOBD2007/08
Grouping Query
H. Galhardas
Benefits of a clustered index
1.



2.

3.
4.

A sparse clustered index stores fewer pointers than a dense
index.
This might save up to one level in the B-tree index.
Nb pointers dense index = nb pointers sparse index * nb
records per page
If records small compared to pages, there will be many
records per page, so sparse has one level less than dense
A clustered index is good for multipoint queries
White pages in a paper telephone book
A clustered index based on a B-Tree supports range, prefix,
extremal and ordering queries well
A clustered index (on attribute X) can reduce lock contention:
Retrieval of records or update operations using an equality, a
prefix match or a range condition based on X will access and
lock only a few consecutive pages of data
AOBD2007/08
H. Galhardas
Evaluation of Clustered Index

Throughput ratio
clustered
nonclustered
no index
1

0.8

0.6
0.4
0.2
0
SQLServer
AOBD2007/08
Oracle
DB2
H. Galhardas
Multipoint query that
returns 100 records out
of 1000000.
Cold buffer
Clustered index is twice
as fast as nonclustered index and
orders of magnitude
faster than a scan.
19
Inconvenient of a clustered index
Benefits can diminish if there is a large number of
overflow data pages

Accessing those pages will usually entail a disk seek

Overflow pages can result from two kinds of
updates:



AOBD2007/08
Inserts may cause data pages to overflow
Record replacements that increase the size of a record
(e.g., the replacement of a NULL values by a long string)
H. Galhardas
Evaluation of clustered indexes with
insertions (1)


Throughput
(queries/sec)
SQLServer

No maintenance
Maintenance
0
20
40
60
80

100
% Increase in Table Size
AOBD2007/08
H. Galhardas
Index is created with
fillfactor = 100.
Insertions cause page splits
and extra I/O for each query
Maintenance consists in
dropping and recreating the
index
With maintenance
performance is constant
while performance degrades
significantly if no
maintenance is performed.
21
Evaluation of clustered indexes with
insertions (2)


DB2
Throughput
(queries/sec)
50

40
30
20
No maintenance
Maintenance
10
0
0
20
40
60
80

100
% Increase in Table Size
AOBD2007/08
H. Galhardas
Index is created with pctfree
=0
Insertions cause records to
be appended at the end of
the table
Each query thus traverses
the index structure and
scans the tail of the table.
Performances degrade
slowly when no
maintenance is performed.
Evaluation of clustered indexes with
insertions (3)

Oracle
Throughput
(queries/sec)


No
maintenance
0
AOBD2007/08
20
40
60
80
% Increase in Table Size

100
H. Galhardas
In Oracle, clustered index are
approximated by an index
defined on a clustered table
No automatic physical
reorganization
Index defined with pctfree = 0
Overflow pages cause
performance degradation
23
Redundant tables

Because there is only one clustered index per
table, it might be a good idea to replicate a
table in order to use a clustered index on two
different attributes
•
•
Yellow and white pages in a paper telephone book
Works well if low insertion/update rate
AOBD2007/08
H. Galhardas
Covering Index - definition



A nonclustering index can eliminate the need to access
the underlying table through covering (composite index)
For the query:
Select name
from employee
where department = “marketing”
Good covering index would be on (department, name)


Index on (name, department) less useful.
Index on department alone moderately useful.
Inconvenients (of composite indexes):


Tend to have a large key size
Update to one of the attributes causes index to be modified
AOBD2007/08
H. Galhardas
25
Covering Index - impact
Throughput (queries/sec)

70
60
covering
50
covering - not
ordered
non clustering
40
30
20

clustering
10
Covering index performs
better than clustering index
when first attributes of
index are in the where
clause and last attributes in
the select.
When attributes are not in
order then performance is
much worse.
0
SQLServer
AOBD2007/08
H. Galhardas
26
Benefits of non-clustered indexes
A dense index can eliminate the need to access
the underlying table through covering.
1.
It might be worth creating several indexes to
increase the likelihood that the optimizer can find a
covering index
•
A non-clustered index is good if each query retrieves
significantly fewer records than there are pages in
the table.
2.
Point queries always useful
•
Multipoint queries useful if:
number of distinct key values > c * number of records per
page
Where c is the number of pages retrieved in each prefetch
•
AOBD2007/08
H. Galhardas
Table Scan Can Sometimes Win

Throughput (queries/sec)


scan
non clustering
0
5
AOBD2007/08
10
15
% of selected records
20
IBM DB2 v7.1 on Windows
2000
Range Query
If a query retrieves 10% of
the records or more,
scanning is often better
than using a non-clustering
non-covering index.
25
H. Galhardas
28
Joins, Foreign Keys and Indexes
R |X| R.A = S.B S can be executed through:
(NR = nb tuples R, NS = nb tuples S, BR = nb pages
R, BS = nb pages S)
 Nested Loop (NL)


Cost I/O: NR *BS + BR, can be reduced to BR + BS if the
smaller relation fits entirely in memory
Indexed Nested Loop (INL)

Cost: BR + c * NR, c: cost of traversing index and fetching all
matching s tuples for one tuple or r

Hash Join (HJ)

Cost I/O: 3(BR + BS), can be reduced to BR + BS if the
entire build input can be kept in main memory
AOBD2007/08
H. Galhardas
Choice of join algorithms (1)


If there is no index present, the system will choose to use a
hash join
If there is an index present:

An INL based on an index on S.B works better than a HJ if the nb
of distinct values in S.B is almost equal to the nb of rows of S


The same, regardless the nb of distinct values of S.B, if the
index covers the join


The only accesses to S data occur within the index
The same, regardless the nb of distinct values of S.B, if S is
clustered based on B


Common case because most joins are FK joins
All S rows having equal B values will be colocated
Otherwise, the hash join may be better
AOBD2007/08
H. Galhardas
Choice of join algorithms (2)
An index may be particularly useful in two other
situations:
 In a non-equi-join (R.A > S.B), an index (using a
Btree) on the join attribute avoids a full table scan in
the NL.
 To support the FK constraint when R.A is a subset of
R.B, so A is a FK in R; B is a PK in S


An index in S speeds up insertions on R: for every record
inserted in R, check the foreign constraint on S
Similarly, an index on R.A speeds up deletions in S
AOBD2007/08
H. Galhardas
Index on Small Tables


Tuning manuals suggest to avoid indexes on small
tables (containing fewer than 200 records)
This number depends on the size of records
compared with the size of the index key

If all data from a relation fits in one page (or in a single disk
track and can be read into memory through a single
physical read by prefetching)
an index page adds at least an I/O


If each record fits in a page, then 200 records may require
200 disk accesses or more.
an index helps performance


If many inserts execute on a table with a small index, the
index itself may become a concurrency control bottleneck

Lock conflicts near the root
AOBD2007/08
H. Galhardas
Index on Small Tables and Updates
If transactions update a single record, without an index, each
transaction scans through many records before it locks the relevant
record, thus reducing update concurrency

Throughput (updates/sec)

18
16
14
12
10
8
6
4
2
0

no index
AOBD2007/08
index

H. Galhardas
Small table: 100 records
Two concurrent processes
perform updates (each
process works for 10ms
before it commits)
No index: the table is
scanned for each update.
No concurrent updates.
A clustered index allow to
take advantage of row
locking.
Table organization and index selection:
basic rules (1)
Use a hash index for point queries only. Use a Btree if multipoint queries or range queries are used
Use clustering
1.
2.
if your queries need all or most of the fields of each
records returned, but the records are too large for a
composite index on all fields
if multipoint or range queries are asked
•
•
3.
4.
Use a dense index to cover critical queries
Don’t use an index if the time lost when inserting
and updating overwhelms the time saved when
querying
AOBD2007/08
H. Galhardas
Table organization and index
selection: basic rules (2)

Use key compression




If you are using a B-tree
Compressing the key will reduce the number of
levels in the tree
The system is disk-bound but not CPU-bound
Updates are relatively rare
AOBD2007/08
H. Galhardas