Download Parallel Databases

Document related concepts

Parallel port wikipedia , lookup

Distributed operating system wikipedia , lookup

Transcript
Parallel Databases
COMP3017 Advanced Databases
Dr Nicholas Gibbins - [email protected]
2012-2013
Definitions
Parallelism
– An arrangement or state that permits several operations or tasks to be
performed simultaneously rather than consecutively
Parallel Databases
– have the ability to split
– processing of data
– access to data
– across multiple processors
2
Why Parallel Databases
• Hardware trends
• Reduced elapsed time for queries
• Increased transaction throughput
• Increased scalability
• Better price/performance
• Improved application availability
• Access to more data
• In short, for better performance
3
Shared Memory Architecture
• Tightly coupled
• Symmetric Multiprocessor (SMP)
P = processor
P
P
P
Global Memory
M = memory
4
Software – Shared Memory
• Less complex database software
• Limited scalability
P
P
P
• Single buffer
• Single database storage
Global Memory
5
Shared Disc Architecture
• Loosely coupled
• Distributed Memory
P
P
P
M
M
M
S
6
Software – Shared Disc
• Avoids memory bottleneck
• Same page may be in more than
one buffer at once – can lead to
incoherence
P
P
P
M
M
M
• Needs global locking mechanism
• Single logical database storage
S
• Each processor has its own
database buffer
7
Shared Nothing Architecture
• Massively Parallel
• Loosely Coupled
• High Speed Interconnect
(between processors)
P
P
P
M
M
M
8
Software - Shared Nothing
• One page is only in one local
buffer – no buffer incoherence
• Needs distributed deadlock
detection
P
P
P
M
M
M
• Needs multiphase commit
protocol
• Needs to break SQL requests into
multiple sub-requests
• Each processor has its own
database buffer
• Each processor owns part of the
data
9
Hardware vs. Software Architecture
• It is possible to use one software strategy on a different
hardware arrangement
• Also possible to simulate one hardware configuration on
another
– Virtual Shared Disk (VSD) makes an IBM SP shared nothing system
look like a shared disc setup (for Oracle)
• From this point on, we deal only with shared nothing
software mode
10
Shared Nothing Challenges
• Partitioning the data
• Keeping the partitioned data balanced
• Splitting up queries to get the work done
• Avoiding distributed deadlock
• Concurrency control
• Dealing with node failure
11
Hash Partitioning
Table
12
Range Partitioning
A-H
I-P
Q-Z
13
Schema Partitioning
Table 1
Table 2
14
Rebalancing Data
Data in proper balance
Data grows, performance drops
Add new nodes and disc
Redistribute data to new nodes
15
Dividing up the Work
Application
Coordinator
Process
Worker
Process
Worker
Process
Worker
Process
16
DB Software on each node
App1
DBMS
App2
DBMS
DBMS
C1
W1
C2
W2
W1
W2
W1
W2
17
Transaction Parallelism
Improves throughput
Inter-Transaction
– Run many transactions in parallel, sharing the DB
– Often referred to as Concurrency
18
Query Parallelism
Improves response times (lower latency)
Intra-operator (horizontal) parallelism
– Operators decomposed into independent operator instances, which
perform the same operation on different subsets of data
Inter-operator (vertical) parallelism
– Operations are overlapped
– Pipeline data from one stage to the next without materialisation
19
Intra-Operator Parallelism
SQL Query
Subset
Queries
Subset
Queries
Subset
Queries
Subset
Queries
Processor
Processor
Processor
Processor
20
Intra-Operator Parallelism
Example query:
– SELECT c1,c2 FROM t1 WHERE c1>5.5
Assumptions:
– 100,000 rows
– Predicates eliminate 90% of the rows
21
I/O Shipping
Coordinator
and Worker
10,000
(c1,c2)
Network
Worker
Worker
Worker
Worker
25,000 rows
25,000 rows
25,000 rows
25,000 rows
22
Function Shipping
Coordinator
10,000
(c1,c2)
Network
Worker
Worker
Worker
Worker
2,500 rows
2,500 rows
2,500 rows
2,500 rows
23
Function Shipping Benefits
• Database operations are performed where the data are, as far
as possible
• Network traffic in minimised
• For basic database operators, code developed for serial
implementations can be reused
• In practice, mixture of Function Shipping and I/O Shipping
has to be employed
24
Inter-Operator Parallelism
Scan
Join
Sort
Scan
Join
Sort
time 25
Resulting Parallelism
Scan
Join
Sort
Scan
Join
Sort
Scan
Scan
Join
Join
Sort
Sort
time 26
Basic Operators
• The DBMS has a set of basic operations which it can perform
• The result of each operation is another table
– Scan a table – sequentially or selectively via an index – to produce
another smaller table
– Join two tables together
– Sort a table
– Perform aggregation functions (sum, average, count)
• These are combined to execute a query
27
The Volcano Architecture
• The Volcano Query Processing System was devised by Goetze
Graefe (Oregon)
• Introduced the Exchange operator
– Inserted between the steps of a query to:
– Pipeline results
– Direct streams of data to the next step(s), redistributing as
necessary
• Provides mechanism to support both vertical and horizontal
parallelism
• Informix ‘Dynamic Scalable Architecture’ based on Volcano
28
Exchange Operators
Example query:
– SELECT county, SUM(order_item)
FROM customer, order
WHERE order.customer_id=customer_id
GROUP BY county
ORDER BY SUM(order_item)
29
Exchange Operators
SORT
GROUP
HASH
JOIN
SCAN
Customer
SCAN
Order
30
Exchange Operators
HASH
JOIN
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
Customer
31
Exchange Operators
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
Customer
HASH
JOIN
EXCHANGE
SCAN
SCAN
Order
SCAN
32
SORT
EXCHANGE
GROUP
GROUP
EXCHANGE
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
Customer
HASH
JOIN
EXCHANGE
SCAN
SCAN
Order
SCAN
33
Query
Processing
Some Parallel Queries
• Enquiry
• Collocated Join
• Directed Join
• Broadcast Join
• Repartitioned Join
35
Orders Database
CUSTOMER
CUSTKEY
C_NAME
… C_NATION
…
ORDER
ORDERKEY DATE
… CUSTKEY … SUPPKEY
…
SUPPLIER
SUPPKEY
S_NAME
… S_NATION
…
36
Enquiry/Query
“How many customers live in the UK?”
Data from application
Return to application
COORDINATOR
SUM
Return subcounts
to coordinator
SLAVE TASKS
SCAN
COUNT
Multiple partitions
of customer table
37
Collocated Join
“Which customers placed orders in July?”
UNION
JOIN
SCAN
ORDER
SCAN
CUSTOMER
Requires a JOIN of CUSTOMER
and ORDER
Tables both partitioned on
CUSTKEY (the same key) and
therefore corresponding entries
are on the same node
38
Directed Join
“Which customers placed orders in July?”
(tables have different keys)
Return to application
Coordinator
Slave Task 1
Slave Task 2
JOIN
SCAN
SCAN
ORDER
CUSTOMER
ORDER partitioned on ORDERKEY, CUSTOMER partitioned on CUSTKEY
Retrieve rows from ORDER, then use ORDER.CUSTKEY to direct
appropriate rows to nodes with CUSTOMER
39
Broadcast Join
“Which customers and suppliers are in the same country?”
Return to application
Coordinator
Slave Task 1
Slave Task 2
JOIN
SCAN
SUPPLIER
BROADCAST
SCAN
CUSTOMER
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION
Send all SUPPLIER to each CUSTOMER node
40
Repartitioned Join
“Which customers and suppliers are in the same country?”
Coordinator
Slave Task 3
JOIN
Slave Task 1
Slave Task 2
SCAN
SCAN
SUPPLIER
CUSTOMER
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION. Repartition both tables on *_NATION to
localise and minimise the join effort
41
Query Handling
• Critical aspect of Parallel DBMS
• User wants to issue simple SQL, and not be concerned with
parallel aspects
• DBMS needs structures and facilities to parallelise queries
• Good optimiser technology is essential
• As database grows, effects (good or bad) of optimiser become
increasingly significant
42
Further Aspects
• A single transaction may update data in several different
places
• Multiple transactions may be using the same (distributed)
tables simultaneously
• One or several nodes could fail
• Requires concurrency control and recovery across multiple
nodes for:
– Locking and deadlock detection
– Two-phase commit to ensure ‘all or nothing’
43
Deadlock
Detection
Locking and Deadlocks
• With Shared Nothing architecture, each node is responsible
for locking its own data
• No global locking mechanism
• However:
– T1 locks item A on Node 1 and wants item B on Node 2
– T2 locks item B on Node 2 and wants item A on Node 1
– Distributed Deadlock
45
Resolving Deadlocks
• One approach – Timeouts
• Timeout T2, after wait exceeds a certain interval
– Interval may need random element to avoid ‘chatter’
i.e. both transactions give up at the same time and then try again
• Rollback T2 to let T1 to proceed
• Restart T2, which can now complete
46
Resolving Deadlocks
• More sophisticated approach (DB2)
• Each node maintains a local ‘wait-for’ graph
• Distributed deadlock detector (DDD) runs at the catalogue
node for each database
• Periodically, all nodes send their graphs to the DDD
• DDD records all locks found in wait state
• Transaction becomes a candidate for termination if found in
same lock wait state on two successive iterations
47
Reliability
Reliability
We wish to preserve the ACID properties for parallelised
transactions
– Isolation is taken care of by 2PL protocol
– Isolation implies Consistency
– Durability can be taken care of node-by-node, with proper logging
and recovery routines
– Atomicity is the hard part. We need to commit all parts of a
transaction, or abort all parts
Two-phase commit protocol (2PC) is used to ensure that
Atomicity is preserved
49
Two-Phase Commit (2PC)
Distinguish between:
– The global transaction
– The local transactions into which the global transaction is
decomposed
Global transaction is managed by a single site, known as the
coordinator
Local transactions may be executed on separate sites, known
as the participants
50
Phase 1: Voting
• Coordinator sends “prepare T” message to all participants
• Participants respond with either “vote-commit T” or
“vote-abort T”
• Coordinator waits for participants to respond within a
timeout period
Phase 2: Decision
• If all participants return “vote-commit T” (to commit), send
“commit T” to all participants. Wait for acknowledgements
within timeout period.
• If any participant returns “vote-abort T”, send “abort T” to all
participants. Wait for acknowledgements within timeout
period.
• When all acknowledgements received, transaction is
completed.
• If a site does not acknowledge, resend global decision until it
is acknowledged.
Normal Operation
P
C
prepare T
vote-commit T
vote-commit T
received from all
participants
commit T
ack
53
Parallel
Utilities
Parallel Utilities
• Ancillary operations can also exploit the parallel hardware
– Parallel Data Loading/Import/Export
– Parallel Index Creation
– Parallel Rebalancing
– Parallel Backup
– Parallel Recovery
55