Download Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Database scalability and indexes
Goetz Graefe
Hewlett-Packard Laboratories
Palo Alto, CA – Madison, WI
Dimensions of scalability
•
•
•
•
•
•
•
•
•
Data size – cost per terabyte ($/TB)
Information complexity (database schema size)
Operational scale (data sources & transformations)
Multi-programming level (many queries)
Concurrency (updates, roll-in load, roll-out purge)
Query complexity (tables, operations, parameters)
Representation (indexing) complexity
Storage hierarchy (levels, staging)
Hardware architecture (e.g., parallelism)
May 22, 2017
Database scalability and indexes
2
Agenda
• Indexing taxonomy
• B-tree technology
May 22, 2017
Database scalability and indexes
3
May 22, 2017
Database scalability and indexes
4
May 22, 2017
Database scalability and indexes
5
May 22, 2017
Database scalability and indexes
6
Balancing bandwidths
• Disk, network, memory, CPU processing
– Decompression, predicate evaluation, copying
• Table scans
– Row stores, column stores
– NSM versus PAX versus ?
How many disks
per CPU core?
• Index scans
– Range queries, look-ups, MDAM
• Intermediate results
– Sort, hash join, hybrid hash join, etc.
May 22, 2017
Database scalability and indexes
Flash devices or
traditional disks?
7
May 22, 2017
Database scalability and indexes
8
Hardware support
• CPU caches
– Alignment, data organization
– Prefetch instructions
• Instructions for large data
Binary search or
interpolation search?
– Quadwords, etc.
• Native encoding
Avoid XML?
– Avoid decimal numerics
• GPUs? FPGAs?
May 22, 2017
Database scalability and indexes
9
May 22, 2017
Database scalability and indexes
10
Read-ahead and write-behind
Buffer pool = latency × bandwidth
• Disk-order scans
– Guided by allocation information
More I/O requests
than devices!
• Index-order scans
– Guided by parent & grandparent levels
– Avoid neighbor pointers in B-tree leaves
• Index-to-index navigation
More I/O requests
than devices!
– Sort references prior to index nested loops join
– Hint references from query execution to storage layer
May 22, 2017
Database scalability and indexes
11
May 22, 2017
Database scalability and indexes
12
“Fail fast” and fault isolation
• Local slow-down produces asymmetry
– Weakest node imposes global slow-down
• Enable asynchrony in I/O and in processing
• Enable incremental load balancing
– Schedule multiple work units per server
– Largest first, assign work as servers free up
25 work units for 8 servers:
S, J, etc. first – Q, Z, Y, X last
May 22, 2017
Database scalability and indexes
13
May 22, 2017
Database scalability and indexes
14
Scheduling in query execution
•
•
•
•
Admission control – too much concurrency
Degree of parallelism – match available cores
Pipelining of operations – avoid thrashing
“Slack” between producers and consumers
–
–
–
–
Partitioning: output buffer per consumer
Merging: input buffer per producer
“Free” packets to enable asynchronous execution
512×512×4×64 KB = 236 B = 16 GB
Lower memory need with more synchronization?
May 22, 2017
Database scalability and indexes
15
May 22, 2017
Database scalability and indexes
16
Synchronization in communication
• “Slack” is a bad place to save memory!
• Demand-driven versus data-driven execution
– Faster producer will starve for free packets
– Faster consumer will starve for full packets
– Slowest step in pipeline determines bandwidth
May 22, 2017
Database scalability and indexes
17
May 22, 2017
Database scalability and indexes
18
Bad algorithms in query execution
• Query optimization versus query execution
– Compile-time versus run-time
– Anticipated sizes, memory availability, etc.
• Fast execution with perfect query optimization
– Merge join: sorted indexes, sorted intermediate results
– Hash join
• Robust execution by run-time adaptation
– Index nested loops join
– Requires some innovation …
May 22, 2017
Database scalability and indexes
19
Query
select count (*) from lineitem
where l_partkey >= :lowpart
and l_shipdate >= :lowdate
• Varying predicate selectivity together or separately
• Forced plans – focus on robustness of execution
– Resource management (memory allocation)
– Index use, join algorithm, join order
May 22, 2017
CIDR 2009
20
Physical database
• Primary index on order key, line number
• 1-column (non-covering) secondary indexes
– Foreign keys, date columns
• 2-column (covering) secondary indexes
– Part key + ship date, ship date + part key
• Large plan space
–
–
–
–
Table scan
Single index + fetch from table
Join two indexes to cover the query
Exploit two-column indexes
May 22, 2017
CIDR 2009
21
Wildly different performance curves
Single-table execution times
1,000.00
900.00
800.00
Time [seconds]
700.00
600.00
500.00
400.00
300.00
200.00
100.00
8
3,
74
9,
55
15
8
,0
04
,4
29
59
,9
86
,0
52
8
93
8,
00
23
4,
41
70
,4
32
23
,3
76
5,
83
9
2,
87
0
1,
42
0
72
6
37
0
19
5
98
50
27
16
0.00
Row count
May 22, 2017
Scan plan
Fetch plan
Merge join
Join + fetch
Join plan
Database scalability and indexes
Fetch 9115
Hash join
22
Observations
• Table scan is very robust but not efficient
– Materialized views should enable fetching query results
• Traditional fetch is very efficient but not robust
– Perhaps addressed with risk-based cost calculation
• Multi-index plans are efficient and robust
– Independent of join order + method (in this experiment)
• Non-traditional fetch is quite robust
– Asynchronous prefetch or read-ahead
– Sorting record identifiers or keys in primary index
– Sort effect seems limited at high end
May 22, 2017
CIDR 2009
23
May 22, 2017
Database scalability and indexes
24
Hash join vs index nested loops join
• In-memory is an index!
– Direct address calculation
– Thread-private: memory allocation, concurrency control
• Traditional index nested loops join
– Index search using comparisons and binary search
– Shared pages in the buffer pool
• Improved index nested loops join
– Prefetch & pin the index in the buffer pool
– Replace page identifiers with in-memory pointers
– Replace binary search with interpolation search
May 22, 2017
Database scalability and indexes
25
Index maintenance
• Data warehouse: fact table with 3-9 foreign keys
– Non-clustered index per foreign key
– Plus 1-3 date columns with non-clustered indexes
– Plus materialized and indexed views
• Traditional bulk insertion (load, roll-in)
– Per row: 4-12 index insertions, read-write 1 leaf each
– Per disk: 200 I/Os per second, 10 rows/sec = 1 KB/sec
• Known techniques
– Drop indexes prior to bulk insertion?
– Deferred index & view maintenance?
May 22, 2017
Database scalability and indexes
26
Partitioned B-trees
Traditional B-tree index
z
a
Partitioned B-tree …
a
#1
z a
#2
z a
#3
z a
#4
z
… after merging a-j
a
May 22, 2017
#0
j k #1 z k #2 z k #3 z k #4 z
27
Algorithms
• Run generation
– Quicksort or replacement selection (priority queue)
– Exploit all available memory, grow & shrink as needed
• Merging
– Like external merge sort, efficient on block-access
– Exploit all available memory, grow & shrink as needed
– Best case: single merge step
May 22, 2017
28
Concurrency control and recovery
“Must reads”
for database geeks
May 22, 2017
Database scalability and indexes
29
Concurrency control and recovery
“Should reads”
for database geeks
May 22, 2017
Database scalability and indexes
30
Tutorial on hierarchical locking
• More generally: multi-granularity locking
• Lock acquisition down a hierarchy
– “Intention” locks IS and IX
S
X
• Standard example: file & page
– T1 holds S lock on file
– T2 wants IS lock on file,
S locks on some pages
– T3 wants X lock on file
– T4 wants IX lock on file,
X locks on some pages
Goetz Graefe: Key-range
locking
S
X
IS
IX
SIX
S
ok
ok
X
S
ok
X
IS
ok
IX SIX
ok
ok
ok
ok
ok
ok
31
Quiz
• Why are all intention locks compatible?
• Conflicts are decided more accurately
at a finer granularity of locking.
Goetz Graefe: Key-range
locking
32
SQL Server
lock modes
Goetz Graefe: Key-range
locking
33
Lock manager invocations
• Combine IS+S+Ø into SØ (“key shared, gap free”)
Cut lock manager invocations by factor 2
• Strict application of standard techniques
No new semantics
Automatic derivation
S
S
X
ok
IS
IX
ok
S
X
SØ ØS XØ ØX SX
ok
ok
ok
SØ
ok
ok
ok
ØS
ok
ok
ok
ok
IX
ok
ok
ok
ok
ok
ok
ok
ØX
ok
SX
ok
XS
Goetz Graefe: Key-range
locking
XS
X
XØ
X
IS
S
ok
ok
ok
ok
ok
34
Key deletion
• User transaction
– Sets ghost bit in record header
– Lock mode is XØ (“key exclusive, gap free”)
• System transaction
–
–
–
–
Verifies absence of locks & lock requests
Erases ghost record
No lock required, data structure change only
Absence of other locks is required
Goetz Graefe: Key-range
locking
35
Key insertion after deletion
• Insertion finds ghost record
– Clears ghost bit
– Sets other fields as appropriate
– Lock mode is XØ (“key exclusive, gap free”)
• Insertion reverses deletion
Goetz Graefe: Key-range
locking
36
Key insertion
• System transaction creates a ghost record
– Verifies absence of ØS lock on low gap boundary
(actually compatibility with ØX)
– No lock acquisition required
• User transaction marks the record valid
– Locking the new key in XØ (“key exclusive, gap free”)
– High concurrency among user insertions
• No need for “creative” lock modes or durations
• Insertion mirrors deletion
Goetz Graefe: Key-range
locking
37
Logging a deletion
• Traditional design
– Small log record in user transaction
– Full undo log record in system transaction
• Optimization
–
–
–
–
–
Single log record for entire system transaction
With both old record identifier and transaction commit
No need for transaction undo
No need to log record contents
Big savings in clustered indexes
Transaction …, Page …, erase ghost 2; commit!
Goetz Graefe: Key-range
locking
38
Logging an insertion
• 1st design
– Minimal log record for ghost creation – key value only
– Full log record in user transaction for update
• 2nd design
– Full user record created as ghost – full log record
– Small log record in user transaction
• Bulk append
– Use 1st design above
– Run-length encoding of multiple new keys
Transaction …, Page …, create ghosts 4-8, keys 4711 (+1)
Goetz Graefe: Key-range
locking
39
Summary: key range locking
• “Radically old” design
• Sound theory – no “creative” lock modes
– Strict application of multi-granularity locking
– Automatic derivation of “macro” lock modes
– Standard lock retention until end-of-transaction
• More concurrency than traditional designs
– Orthogonality avoids missing lock modes
• Key insertion & deletion via ghost records
– Insertion is symmetric to deletion
– Efficient system transactions, including logging
Goetz Graefe: Key-range
locking
40
Like scalable
database indexing
May 22, 2017
Database scalability and indexes
41
Summary
• Re-think parallel data & algorithms:
– Partitioning: load balancing
– Pipelining: communication & synchronization
– Local execution: algorithms & data structures!
• Re-think power efficiency
– Algorithms & data structures!
• Database query & update processing
– Re-think indexes & their implementation
May 22, 2017
Database scalability and indexes
42
Related documents