Download Relational Database Systems 2

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Relational Database Systems 2
3. Indexing and Access Paths
Christoph Lofi
Benjamin Köhncke
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
2 Storage
• Each DBMS uses several types of storage
– Storage hierarchy
– Predominant storage type is the hard disk
• Hard disks
– Slow random access
– Fast sequential transfer
– Unreliable & failure prone
Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme
2
3 Indexing and Access Paths
3.1 Introduction to access paths
3.2 Files and blocks
3.3 Indexing
– Single-level indexes
– Multi-level indexes
– Hash indexes
3.4 Physical Tuning
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
3
3.1 Introduction to Access Paths
• Databases persistently store data
• Major problem: efficiency of data access
– Obviously depends on the hardware used
• But also depends to a large degree
– On the allocation of data on the disks
– On intelligent buffer management
– On creating access paths and indexing data
• Physical tuning of the database is a main task of
database administrators
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
4
3.1 Introduction to Access Paths
• For persistent storage databases are mapped into a
number of files
– Located in specially protected parts of the file system
(tablespaces, etc.)
– Actually maintained by the operating system
• Each file is partitioned into fixed length blocks (pages)
–
–
–
–
SKS 10.5
Smallest unit transferred from/to storage
Block size is specified on DB creation
Is a multiple of the OS block size
A block usually contains several data records
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
5
3.1 Sectors and Blocks
• Harddisk Sectors are abstracted by the file system
to blocks
• DBMS abstracts FS blocks to DBMS blocks
DISK
FS Block 1
DBMS Block 1
FS Block 2
DBMS Block 2
FS Block 3
DBMS Block 3
FS Block 4
DBMS Block 4
FS Block 5
DBMS Block 5
FS Block 6
DBMS Block 6
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
6
3.1 Buffer Management
• For data access always complete blocks (pages)
are transferred from disk into the main memory
– The part used for holding copies of blocks is referred
to as DB buffer and managed by the buffer
manager
– Optimized (pre-)fetching of blocks greatly improves
performance
• If a data record is requested by the DB
– If the block is buffered, return main memory address
– Else allocate space in the buffer, fetch the block
from disk, and return main memory address
SKS 10.5
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
7
3.1 Buffer Management
• Once new blocks are fetched, generally currently
buffered blocks have to be evicted
– If blocks have been modified, they must be written
back to disk (which is not always possible)
– Writing of blocks depends on the recovery
strategy:
• Blocks that cannot yet be written are called pinned blocks
• Before checkpoints there might be a forced output of
blocks
• Several buffer replacement
strategies can be applied
SKS 10.5
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
8
3.1 Replacement Strategies
• Least recently used (LRU): the „oldest‟ block is
replaced
– Usually used in OS buffer management
– Simple, yet effective strategy
– Does not use semantics of the data
• Toss immediate: After the DB has finished to process
a block, the block is immediately replaced
– Example: block-nested loop join between tables R, S
• Take one block from R and compare against all blocks in S
• The block from R can be thrown out, but no block from S
SKS 10.5
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
9
3.1 Replacement Strategies
• Expected Reuse: statistics about query frequencies
assess how useful a buffered block is
– The block with least expected usefulness is evicted
– Example: index blocks are more often
addressed than data blocks
• In any case, whatever the strategy:
Once a block has been modified and has not yet been
written to disk, it cannot be evicted!
– Note: recovery subsystem has to agree before writing a block
to disk
SKS 10.5
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
10
3.2 Blocks (Pages)
• Overhead & Payload (e.g., ORACLE data
blocks)
– Header contains general block information like
block address, type of segment (data, index,…)
– Table directory contains
tables that have rows
in the block
– Row directory contains
information about the
actual rows in the block
(IDs, addresses,…)
– Row data is actual data
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
11
3.2 Extents and Segments
• An Extent is a logical collection
of blocks (usually adjacent)
– Fixed size, once more data space is
needed for rows a new extent is
allocated
– Remember: reading adjacent
blocks improves access time
• Segments are collections of
logically connected extents
– For instance tables, index segments,
or rollback segments
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
12
3.2 Tablespaces
• A tablespace is the logical storage space needed
for the data in a table
– A grouping of multiple files allocated on disk
• Example: Data1.ora, system.ora, test.dbf
– Good practice to have one tablespace for tables, and a
different one for indexes
CREATE TABLESPACE user_data
DATAFILE ‘udata.ora’ SIZE 10M
EXTENT MANAGEMENT LOCAL
SEGMENT SPACE MANAGEMENT AUTO
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
13
3.2 File Organization
• Records of different relations should be stored in
individual files
– Storing related records in the same block minimizes
disk accesses
• Multitable clustering file organization may
store records of different tables in the same block
– Good e.g. for some often occurring joins
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
14
3.2 File Organization
• Files reserve space in the file system
– File size can be specified or change dynamically
– Default values strongly differ (e.g., DB2 allocates 100 MB)
Files related to table
“content_type_lecture”
Schema file
Data file with data blocks
Index file with index blocks
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
15
3.2 File Organization
• Organization of records in the file
– Heap – a record can be placed anywhere in the file
where there is space
– Sequential – store records in sequential order, based
on the value of the search key of each record
– Hashing – a hash function is computed on some
attribute of each record. The result specifies in which
block of the file the record should be placed
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
16
3.2 File Organization
• Data records have to be written in a file such that the
entire record can be accessed with minimal disk accesses
– Fixed length records (easy to implement)
– Variable length records (storage space efficient)
TYPE deposit = RECORD
name : CHAR(22);
accNo : CHAR(10);
balance : REAL;
END
– If each char is 1 byte and a real
8 bytes, a record takes 40 bytes
SKS 10.6
record
Name
AccNo
0
Burg
864442
654,55
1
Myers
967531
56,45
2
Smith
145288
457,75
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
Balance
17
3.2 File Organization
• Fixed length records
– 40 bytes for the first record, 40 bytes for the second,…
– Problem: if the block size is not a multiple of 40, records may
cross boundaries
– Problem: deleting a record can be either done by marking it
as deleted, or by replacing it with some other record of the file
• Keeping deleted items? Reading slow!
• Move up all other items? Efficiency!
• Fill space with next inserted item?
Changes sequence!
• Pointer lists? Wastes space in each
record!
SKS 10.6
record
Name
AccNo
0
Burg
864442
654,55
1
Myers
967531
56,45
2
Smith
145288
457,75
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
Balance
18
3.2 File Organization
• Variable length records
– Necessary for multi-table files or records that allow variable
length attributes
– Problem: How to know when a record ends?
• Introduce ‘end-of-record’ symbols or store record length at
beginning of the record. But what if a record is updated?
• Slotted page structure: header contains number of entries, end of
free space and pointers to location/size of entries. Records are
moved to use
up space: no
fragmentation!
SKS 10.6
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
19
3.2 File Organization
• Typical database records like in our banking example
easily fit into one block
– Name, account number, balance,…
– Usually multiple records per block
• Unspanned record organization
fills blocks only with
i
complete records
i+1
• Spanned organization uses
pointers to divide records
i
i+1
EN 4.4
record 1
record 2
record 4
record 5
record 1
record 2
4
record 3
record 3
4 p
record 5
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
20
3.2 Non-Standard Blocks
• Necessary for large objects: binary large objects
(blobs) and character large objects (clobs)
• Large objects are not interpreted in databases
– Text documents, images, audio and video data
– Need to be stored in a contiguous sequence of bytes
– If an object is bigger than a block, contiguous pages
of the buffer pool have to be allocated for storage
– Sometimes preferable to
disallow direct access, but
only allow access through
file-system-like API to allow
for fragmentation
21
3.3 Indexing
• Indexes help to locate records in a DB file
– Creation of indexes is part of the physical tuning task of
database administrators
– Indexes often influence the actual
location of storage for a record
• Example: sequential storage,
storage via a hash function
• If the location is determined by
the index
– Not all attributes can
be directly indexed (but secondary
access paths may be used)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
22
3.3 Basic Concepts
• When items have to be found quickly, indexing
mechanisms are used to
speed up access
– Alphabetical author
catalog in libraries
– Lexicographic ordering
–…
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
23
3.3 Basic Concepts
• Ordered by indexing field (search key)
– Attribute(s) used to look up records in a file
• An index file consists of index entries
– Records of the form
search key
record location(s)
• Two basic kinds of queries
– Exact matches
• Locating a record(s) with a certain value
– Range queries
• Locating all records in a certain value range
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
24
3.3 Basic Concepts
• Two basic kinds of indexes
– Ordered indexes: search keys are stored in sorted
order
• Single-level ordered indexes
• Multi-level ordered indexes
– Hash indexes: search keys are distributed uniformly
across blocks using some hash function
• Hash function distributes records uniformly over blocks
• Hashed value of search key decides for storage block
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
25
3.3 Single-Level Ordered Indexes
• Index links indexing fields to logical file/block
locations
– Dense indexes index all items, sparse indexes only
selected ones
– Much smaller file than the data file
• Hence, a binary search on the index file requires fewer
block accesses than a binary search on the data file
– Works exactly like a book‟s index
• Only few pages at beginning/end, alphabetically ordered,
references to respective page(s)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
26
3.3 Single-Level Ordered Indexes
• Primary indexes
– Order data by some usually unique attribute as indexing field
(primary key), store database records in this order
– Index record contains pointer to the respective storage place
(block address)
1, Adams, $ 887,00
AccNo 1
AccNo 4
• To save entries usually there
is only a single index entry
for each block (block anchor)
AccNo 7
– But sparse indexes do not directly
show, whether some record is in the DB
2, Bertram, $19,99
3, Behaim, $ 167,00
4, Cesar, $ 1866,00
5, Miller, $179,99
6, Naders, $ 682,56
7, Ruth, $ 8642,78
8, Smith, $675,99
9, Tarrens, $ 99,00
EN 14.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
27
3.3 Single-Level Ordered Indexes
– Advantages
• Number of blocks needed for storing the index is small
compared to data
– Not all records are indexed (non-dense)
– Index entries are smaller than data records
• Can often be kept in buffer
– Disadvantages
• Insertions and Deletions need to move data
in storage and to update index entries
affected by the shifts
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
28
3.3 Single-Level Ordered Indexes
• Clustering indexes store data records in the
order of a non-unique indexing field
– One entry in the index per distinct value of the
indexing field
– Search keys are linked to the block address of the
containing the search key
– Is a sparse index
• But existence of records with a certain key can be assessed
by index look-up
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
29
3.3 Single-Level Ordered Indexes
• Still problems with insertion/deletion
Adams
– Often one or more complete
blocks are reserved for each
Miller
single search key to allow for
Tarrans
block anchors
– Blocks do not have to be
adjacent, but can use block
pointers, if more space is needed
• Structurally very similar to a
hash index
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
1, Adams, $ 887,00
2, Adams, $19,99
4, Miller, $ 1866,00
5, Miller, $179,99
6, Miller, $ 682,56
7, Miller, $ 8642,78
8, Tarrens, $ 856,78
30
3.3 Single-Level Ordered Indexes
• Secondary indexes point to locations of
records regarding a non-ordering attribute
– Indexing does not affect the storage order
– There can be multiple secondary indexes for the same
DB file
• Secondary indexes are usually dense
– Objects with same or adjacent values are usually not
adjacent on disk
– If the indexing field has unique values (secondary
key) definitely all records have to be indexed
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
31
3.3 Single-Level Ordered Indexes
• If the indexing field is not unique there are
several possibilities to create a secondary index
– Create a dense index by including duplicate search
keys (one for each record)
– Use variable-length index entries, where each
search key is assigned a list of pointers
– Keep fixed-length index entries, but point to a block
containing (multiple) pointers to the actual records
• Introduces a level of indirection, but allows for a sparse
index
• Usually used in practice
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
32
3.3 Single-Level Ordered Indexes
• Dense secondary index or sparse with indirection
Adams
Cesar
Miller
3, Miller, $19,99
5, Miller, $179,99
Post
6, Tarrens, $ 682,56
Snyder
7, Post, $ 8642,78
8, Adams, $ 856,78
9, Snyder, $ 856,78
Tarrens
Cesar
p
1, Cesar, $ 887,00
2, Tarrens, $19,99
Miller
p
3, Miller, $19,99
p
4, Orwell, $ 1866,00
Orwell
Tarrens
p
1, Cesar, $ 887,00
2, Tarrens, $19,99
Miller
Adams
4, Orwell, $ 1866,00
Orwell
p
5, Miller, $ 179,99
Post
p
6, Tarrens, $ 682,56
Snyder
p
7, Post, $ 8642,78
Tarrens
8, Adams, $ 856,78
p
9, Snyder, $ 856,78
p
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
33
3.3 Single-Level Ordered Indexes
• Characteristics of secondary indexes
– Speeds up retrieval, because if it does not exist, the
entire file would have to be scanned linearly
• For non-existent primary indexes files can still be scanned in
a binary search fashion
– Uses more search time and space, because it is dense
– Secondary indexes provides a logical ordering
• Accessing records in that order might not be the most
efficient way regarding block accesses
• Each record access may fetch a new block into the buffer
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
34
3.3 Single-Level Ordered Indexes
• Improving secondary indexes for range queries
– Same block may be (unnecesarrily) accessed multiple times
• Simple Example:
Cesar-Orwell
(buffer holds only 1 block)
 Cesar: fetch first block
 Miller: evict first block,
fetch second block
 Miller: evict second
block, fetch first block
 Orwell: evict first block,
fetch second block
Adams
Cesar
1, Cesar, $ 887,00
2, Tarrens, $19,99
Miller
Miller
3, Miller, $19,99
4, Orwell, $ 1866,00
Orwell
5, Miller, $179,99
Post
6, Tarrens, $ 682,56
Snyder
7, Post, $ 8642,78
Tarrens
8, Adams, $ 856,78
9, Snyder, $ 856,78
Tarrens
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
35
3.3 Single-Level Ordered Indexes
• Improving secondary indexes for range queries
– Remember: working on blocks in main memory is
cheap, fetching blocks from disk is expensive!
•
Improved Example: Cesar-Orwell
• Fetch ‘Cesar’ and scan whole block
• Output ‘1’ & ‘3’
• Mark block as completed and
evict
• Fetch ‘Miller’ and scan whole block
• Output ‘4’ & ‘5’
• Mark block as completed and
evict
• Skip second ‘Miller’
• Skip ‘Orwell’
Adams
1, Cesar, $ 887,00
Cesar
2, Tarrens, $19,99
Miller
3, Miller, $19,99
Miller
4, Orwell, $ 1866,00
Orwell
5, Miller, $179,99
6, Tarrens, $ 682,56
Post
7, Post, $ 8642,78
Snyder
Tarrens
Tarrens
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
8, Adams, $ 856,78
9, Snyder, $ 856,78
Done
36
3.3 Single-Level Ordered Indexes
• Short Summary
– Primary and cluster indexes affect storage order,
secondary indexes don‟t
– Primary and clustering indexes may be sparse,
secondary indexes are usually dense
• Primary and clustering indexes can use block anchors
– Index sizes
• Primary index
• Clustering index
• Secondary index
–
–
–
number of blocks
number of distinct search keys
up to number of records in
table
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
37
3.3 Multi-Level Ordered Indexes
• Example: What if a primary index does not fit in
main memory?
– Bad look-up efficiency!
• Simple multi-level solution
– Treat primary index on disk as a sequential file and
construct a sparse index on it
• outer index – sparse index of primary index
• inner index – primary index file
– If even outer index is too large to fit in main memory,
yet another level of index can be created, and so on
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
38
3.3 Multi-Level Ordered Indexes
• Multi-level indexes can further improve search
speed
– In single level indexes a binary search can be applied
to locate pointers to blocks
• Efficiency of log2N
– Multi-level indexes allow for higher search efficiency
• Fan-out of the index should be higher than 2
• Efficiency of logfan-outN
EN 14.2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
39
3.3 Multi-Level Ordered Indexes
• Concept of Multi-Level Indexes can be applied to primary,
clustering and secondary indexes as long as values are distinct
1, Adams, $ 887,00
• Example: 2-Level Primary
1
2, Bertram, $19,99
Index
3
– Level 1 is Primary Index
– Level 2 is Primary Index of
level 1
– Efficiency of logfan-outN
• fan-out: Number of 1st level
blocks per 2nd level block
3, Behaim, $ 167,00
5
5, Miller, $179,99
1
7
7
9
11
Level 2
4, Cesar, $ 1866,00
6, Naders, $ 682,56
7, Ruth, $ 8642,78
8, Smith, $675,99
11
9, Tarrens, $ 99,00
10, Arnold, $ 3442,76
11, Marks, $435,19
Level 1
EN 14.2
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
12, Black, $ 319,44
40
3.3 Multi-Level Ordered Indexes
• There is a huge amount of different tree
structures for multi-level indexing
– Classical structures B-tree, B*-tree, etc.
– Multidimensional structures R-trees, Quad-trees, etc.
– Lots of literature
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
41
3.3 Hash Indexes
• Index Structure based on Hashing
– Idealized Access Speed: O(1)
• Basic Idea:
– Fixed-Size directly addressable Index Space [0..M] containing
M+1 buckets
• Buckets contain links to data
• Single link in internal hashing (i.e. in memory)
• Multiple links for external hashing (i.e. on harddisk)
– Hash Function h: ℤ → [0..M]
• Maps any number to [0..M]
• Should be uniform (i.e.: same probability for any x ∈ [0..M])
– Especially, should also be surjective (i.e. use the full range of [0..M])
• Needs to be deterministic (i.e.: same input → same output)
• Should be simple and fast
EN 13.8.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
42
3.3 Hash Indexes
• Basic Idea
– Convert value which is to be indexed to numeric representation
– Hash the value
– Store block anchor of value in the bucket with computed hash index
• Example: M=8
0
1
9, Snyder, $ 856.78
2
Miller
h(Miller) = 4
3
Cesar
h(Cesar) = 7
4
Snyder
H(Snyder) = 1
5, Miller, $179.99
5
6
7
1, Cesar, $ 887.00
8
EN 13.8.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
43
3.3 Hash Indexes
• Problem: Collision
– Hash functions are not injective – collision may happen
• Block pointers for full blocks (overflow list)
– Probability of collision increases with load factor
• Deteriorates to linear behavior in worst case
0
Rogers
Miller
h(Rogers) = 1
1
2
h(Miller) = 4
3
Cesar
h(Cesar) = 1
4
Snyder
h(Snyder) = 1
3, Rogers, $ 1866.58
1, Cesar, $ 887.00
9, Snyder, $ 856.78
7, Smith, $ 0.29
5
6
Smith
h(Smith) = 1
7
5, Miller, $179.99
8
EN 13.8.1
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
44
3.3 Hash Indexes
• Problem: Static Address Range
– Addressable range is fixed to [0..M]
– What happens if address range is exhausted?
• E.g., load factor too high, too many collisions occur
– Possible solutions
• Rehashing :
– Create a new larger hashmap and rehash all values
– For example, java hashmap follow that policy
• Extendible Hashing:
– Uses dynamic directory to double and half number of buckets
• Linear Hashing:
– Buckets are split into two one after another and are individually
rehashed with an additional hash-function
– Not in this lecture
EN 13.8.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
45
3.3 Hash Indexes
• Extendible Hashing
– Hash values are positive integers between [0..2n]
• Numbers can be represented in binary
• Uses a so-called trie structure
• Choose large n
– Create a directory of depth d with 2d entries for the
first d bits of values
• d is global depth of directory
• Directory cells link to buckets containing links to data with
hash value starting with given bit pattern
– Adjacent cells may link to the same bucket depending on local
depth of bucket
EN 13.8.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
46
3.3 Hash Indexes
• Extendible Hashing (Bucket size 3)
000
0_0010
0_0100
0_1010
Local depth = 1
All hash values starting with 0
001
010
011
100
101
110
111
Global Depth = 3
EN 13.8.3
10_000
10_101
10_110
110_000
110_010
110_100
111_001
111_010
111_111
Local depth = 2
All hash values starting with 10
Local depth = 3
All hash values starting with 110
Local depth = 3
All hash values starting with 111
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
47
3.3 Hash Indexes
• Insert 010000
000
Split this bucket, add 010000
0_0010
0_0100
0_1010
Local depth = 1
All hash values starting with 0
001
010
011
100
101
110
111
Global Depth = 3
EN 13.8.3
10_000
10_101
10_110
110_000
110_010
110_100
111_001
111_010
111_111
Local depth = 2
All hash values starting with 10
Local depth = 3
All hash values starting with 110
Local depth = 3
All hash values starting with 111
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
48
3.3 Hash Indexes
00_010
00_100
000
01_000
01_010
Local depth = 2
All hash values starting with 00
Local depth = 2
All hash values starting with 01
001
010
011
100
101
110
111
Global Depth = 3
EN 13.8.3
10_000
10_101
10_110
110_000
110_010
110_100
111_001
111_010
111_111
Local depth = 2
All hash values starting with 10
Local depth = 3
All hash values starting with 110
Local depth = 3
All hash values starting with 111
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
49
3.3 Hash Indexes
• Extendible Hashing
– Allows to dynamically split and coalesce buckets as
database grows and shrinks
• If bucket is full and local depth is lower than global depth:
– More than one cell links to this bucket
– Bucket is split into two with increased local depth and values are
distributed accordingly
– Links in cells are adjusted accordingly
• If local depth already equals global depth:
– Global depth is increased by one and thus the size of the directory
is doubled
– Each cell is replaced by two cells, both of which contain the same
pointer as the original cell
EN 13.8.3
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
50
3.3 Hash Indexes
• Summary Hash Indexes: May perform in O(1)
– But: Management overhead can be quite large for
growing data collections
• Overflow lists have small management overhead, but may
decrease access performance to O(n)
• Rehashing is very expensive for each growth stage
• Extendible hashing has a smaller overhead and is especially
suitable for external storage hashing
EN 13.8
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
51
3.3 Indexing
• Why not simply index every attribute?
– Physical index on primary key, logical indexes on every
other attribute
• Results in good read efficiency
• But… whenever a DB file is modified, every
index on the file has to be updated
– Updating indices imposes overhead on database
modification (insert, delete, update)
• Especially for multi-level indexes all levels may have to be
updated
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
52
3.4 Physical Tuning
• One of the tasks of the database
administrator
• Black Magic…
– Get DB performance statistics
– Try something that seems sensible
– Get new performance statistics
– If it did not improve, reset it to previous
configuration
– Try something different until satisfied...
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
53
3.4 Physical Tuning
• Black Magic today needs advanced
tools to operate correctly
• For database tuning, you need
– Snapshot Monitors
• Continuously collect cumulative statistics for
the DB within a given time span
– Event Monitors
• Hook into certain events and analyze them
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
54
3.4 Diagnostic Database Tools
• Snapshot Monitors
– Monitor collects cumulative statistics
– Internal counters set at global or session levels
– Collected statistics can be evaluated afterwards
for system tuning on general level
– Data collection is explicitly enabled or disabled
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
55
3.4 Diagnostic Database Tools
• Useful for collecting point-in-time information
regarding overall DB/DBMS behavior
– Locking –lists all locks & types held by all applications
– Bufferpools – cumulative stats for memory, physical &
logical I/O‟s, synch/asynch
– Sorts – provides detailed statistics regard sort-heap
usage, overflows, # active etc.
– Tablespaces – detailed I/O and activity statistics for a
given tablespace
– UOW – displays status of application Unit of Work at
point-in-time
– Dynamic SQL – shows contents of package cache and
related stats at point-in-time
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
56
3.4 Diagnostic Database Tools
• Snapshot Monitors
– Overhead continuously is in the 3% - 5% range
• But no I/O since counters are usually RAM resident, not
written to disk
– Quick list of useful metrics to identify particular
problems include:
•
•
•
•
•
•
bufferpool hit ratios
sort overflows/problems
effectiveness of prefetch
need for indexing
transaction logging issues
single query/application resource domination, etc.
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
57
3.4 Diagnostic Database Tools
• Snapshot Monitor: IBM DB2 Performance Expert
http://www.redbooks.ibm.com/abstracts/sg246470.html
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
58
3.4 Diagnostic Database Tools
• Event Monitors
– Collects information regarding events or chains of
events
• No continuously collected data as with snapshots
– Must be explicitly created and activated via DBMS
commands or API‟s
– Are the best way to effectively diagnose and resolve
transaction problems (e.g., deadlock issues)
– Output may be directed to a database table and
results then analyzed using SQL
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
59
3.4 Diagnostic Database Tools
• Event Monitor can be used to
– Identify and rank most frequently executed or most
time consuming SQL
– Track the sequence of statements leading up to a lock
timeout or deadlock
– Track connections to the database
– Breakdown overall SQL statement execution time
into sub-operations such as sort time and use or
system CPU time
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
60
3.4 Diagnostic Database Tools
• Event Monitor: DBI Brother Panther
http://www.dbisoftware.com
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
61
3.4 The Black Art of Physical Tuning
• Physical DB Tuning is a complicated and intransparent
task
• Usually trial and error
– Measure some hopefully meaningful metrics of you DB
usage statistics
– Adjust around on something
• Mostly indexes and tablespace properties
– Measure again
• If results better : Great! Continue tuning!
• If result worse : Bad! Undo everything you did and try something
else.
• But what to measure and what to do?
– Use best practices or try yourself!
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
62
3.4 Physical Tuning
• What to measure?
• First, what kind of database you have?
– OLTP (Online Transaction Processing) : Database
with many small, concurrent read / write queries
• Optimize for fast read write accesses
• Keep an eye on “real-time” response times
• Optimize for concurrency
– Warehouse (OLAP – Online Analytical Processing):
Using larger and more complex queries for primary
retrieving data
• Optimize for read accesses
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
63
3.4 Tuning Metrics
• There is a huge amount of knobs and screws to
turn
– DB planning and tuning is for professionals
– Special certificates by each database vendor
IBM DB2 IBM Certified Database Associate
IBM Certified Database Administrator
IBM Certified Application Developer DB2
IBM Certified Advanced Database Administrator
http://education.oracle.com
http://www-03.ibm.com/certify/certs/dm_index.shtml
http://www.microsoft.com/learning/mcp/mcdba/
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
64
3.4 Tuning Metrics
• Scott Hayes
– President & CEO of Database-Brothers Inc (DBI)
• If I only had five minutes to assess the status of a
database, I would look at…
– Average Result Set Size
– Index Read Efficiency
– Synchronous Read Percentage
– Average Rows Read per DB Transaction per Table
– Average Read Time
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
65
3.4 Tuning Metrics
• ARSS - Average Result Set Size
– ARSS = RowsSelected / NumSelectQueries
– Rule of Thumb:
• ARSS ≤ 10 indicates a OLTP database
– If you think you have OLTP and ARSS>>10, there is something
wrong…
• ARSS > 10 indicates a warehouse database (OLAP)
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
66
3.4 Tuning Metrics
• IREF - Index Read Efficiency
– “How many rows have to be read to retrieve one row?”
• i.e. How much overhead is within the where predicates?
– IREF = RowsRead / RowsReturned
• Well-designed indexes may evaluate a where-clause without
reading additional rows!
– Rule of Thumb:
• IREF should be less than 10 for OLTP Databases
• May be higher for Warehouses (>100)
– What to do?
• Introduce more efficient indexes!
• User simpler queries!
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
67
3.4 Tuning Metrics
• SRP – Synchronous Read Percentage
– Synchronous Read means the DBMS reads just the
index file and the required data page (good)
– Asynchronous Read means the DBMS employs
prefetching to scan for an indexes or data (BAD)
– Rule of Thumb:
• >90% for OLTP (>50% for warehouses) : Good
• 80%-90% (25%-50%) : Ok
• 50%-80% (<25%) Bad
– What to do?
• Improve index structures!
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
68
3.4 Tuning Metrics
• TBRRTX – Average Rows Read per DB Transaction
per Table
– TBRRTXtable = rowsRead / (attemptedCommits +
attemptedRollBacks)
– Rule of Thumb:
•
•
•
•
<10 for OLTP : Good
10-100 : Ok
>100 : Bad
>1000 : Horrific
– What to do?
• Improve index structures!
• Improve Queries
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
69
3.4 Tuning Metrics
• ART – Average Read Time
– The average time per read
– RT = ReadTime / (NumDataRead + NumIndexRead)
– If you think you did everything within your application
and indices, optimize read time
• Adjust size of tablespace containers
• Distribute containers across disks
– Avoid OS paging device
• Move highly access tablespaces to faster devices
Datenbanksysteme 2 – Christoph Lofi – Institut für Informationssysteme – TU Braunschweig
70
3 Indexes
• Buffer Management is very important
– Holds DB blocks in primary memory
• DB block are made up of several FS blocks
• Find good strategies to have requested DB blocks available when
needed
– Each block holds some meta data and row data
• Indexes drastically speed up queries
– Less blocks need to be scanned
– Primary Index
• On primary key attribute, usually influences row storage order
– Secondary Index
• On any attribute, does not influence storage order
Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme
71
Next Lecture
• Tree-Index Structures
– Binary Tree
• Naïve Binary Tree
• Balancing Tree
– B-Trees
• B
• B+
• B*
Relational Database Systems 2 – Christoph Lofi - Benjamin Köhncke – Institut für Informationssysteme
72