Download Document

ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems 1 So far: Centralized DB systems Software: P M Application SQL Front End Query Processor Transaction Proc. File Access ... • Simplifications:  single front end  one place to keep locks  if processor fails, system fails, ... ICS214B Notes 11 2 Next: distributed database systems • Multiple processors ( + memories) • Heterogeneity and autonomy of “components” ICS214B Notes 11 3 Why do we need Distributed Databases? • Example: Big Corp. has offices in London, New York, and Hong Kong. • Employee data: – EMP(ENO, NAME, TITLE, SALARY, …) • Where should the employee data table reside? ICS214B Notes 11 4 Big Corp. Data Access Pattern • Mostly, employee data is managed at the office where the employee works – E.g., payroll, benefits, hire and fire • Periodically, Big Corp needs consolidated access to employee data – E.g., Big Corp. changes benefit plans and that affects all employees. – E.g., Annual bonus depends on global net profit. ICS214B Notes 11 5 New York Payroll app London Payroll app EMP London New York Internet Hong Kong Payroll app Hong Kong ICS214B Notes 11 Problem: NY and HK payroll apps run very slowly! 6 New York Payroll app London Payroll app London Emp London New York NY Emp Internet Hong Kong Payroll app Much better!! Hong Kong HK Emp ICS214B Notes 11 7 New York Payroll app London Payroll app Annual Bonus app London Emp London New York NY Emp Internet Hong Kong Payroll app Distribution provides opportunities for parallel execution Hong Kong HK Emp ICS214B Notes 11 8 New York Payroll app London Payroll app Annual Bonus app London Emp London New York NY Emp Internet Hong Kong Payroll app Hong Kong HK Emp ICS214B Notes 11 9 New York Payroll app London Payroll app Annual Bonus app Lon, NY Emp London New York NY, HK Emp Internet Hong Kong Payroll app Replication improves availability Hong Kong HK, Lon Emp ICS214B Notes 11 10 Heterogeneity and Autonomy Application Stock ticker tape RDBMS Portfolio ICS214B Notes 11 Files History of dividends, ratios,... 11 • data management with multiple processors and possible autonomy, heterogeneity – Impact on:  Data organization  Query processing  Access structures  Concurrency control  Recovery ICS214B Notes 11 12 • transaction monitors – Coordinate transaction execution  Multiple DBMSs  High performance – Have workflow facilities – Manage communications with client “terminals” ICS214B Notes 11 13 DB architectures (1) Shared memory P P ... P M ICS214B Notes 11 14 DB architectures (2) Shared disk P P P ... M M M ... ICS214B Notes 11 15 DB architectures (3) Shared nothing P P M M ICS214B ... P M Notes 11 16 DB architectures (4) Hybrid example – Hierarchical or Clustered P P ... P P ... P M P M ICS214B Notes 11 17 Issues for selecting architecture • • • • • • Reliability Scalability Geographic distribution of data Data “clusters” Performance Cost ICS214B Notes 11 18 Parallel or distributed DB system? • More similarities than differences! ICS214B Notes 11 19 • Typically, parallel DBs: – Fast interconnect – Homogeneous software – High performance is goal – Transparency is goal ICS214B Notes 11 20 • Typically, distributed DBs: – Geographically distributed – Data sharing is goal (may run into heterogeneity, autonomy) – Disconnected operation possible ICS214B Notes 11 21 Distributed Database Challenges • Distributed Database Design – Deciding what data goes where – Depends on data access patterns of major applications – Two subproblems:  Fragmentation: partition tables into fragments  Allocation: allocate fragments to nodes ICS214B Notes 11 22 Distributed Database Challenges • Distributed Query Processing – Centralized query plan goal: minimize number of disk I/Os – Additional factors in distributed scenario:  Communication costs  Opportunity for parallelism – Space of possible query plans is much larger! ICS214B Notes 11 23 Distributed Database Challenges • Distributed Concurrency Control – Transactions span nodes  Must be globally serializable – Two main approaches:  Locking  Timestamps – Distributed Deadlock Management – Multiple data copies – need to be kept in sync when updates occur ICS214B Notes 11 24 Distributed Database Challenges • Reliability of Distributed Databases – Centralized database failure model:  processor fails – Distributed database failure model:  One or more processors may fail  Network may fail  Network may be partitioned – Data must be kept in sync ICS214B Notes 11 25  To illustrate synchronization problems: “Two Generals” Problem ICS214B Notes 11 26 The one general problem (Trivial!) G  Battlefield Troops ICS214B Notes 11 27 The two general problem: Blue army Blue G ICS214B Enemy Red army <-------------------------------> Red G messengers Notes 11 28 Rules: • Blue and red army must attack at same time • Blue and red generals synchronize through messengers • Messengers can be lost ICS214B Notes 11 29 Distributed Database Challenges • Heterogeneity Application Stock ticker tape RDBMS Portfolio ICS214B Notes 11 Files History of dividends, ratios,... 30 Distributed Database Challenges • Autonomy Example: unable to get statistics for query optimization Example: blue general may have mind of his (or her) own! ICS214B Notes 11 31 • Distributed DB Design ICS214B Notes 11 32 Distributed DB Design Top-down approach: - have DB… - how to split and allocate the sites Bottom-up approach: - multi-database (possibly heterogeneous, autonomous) - no design issues! ICS214B Notes 11 33 Two issues in DDB design: • Fragmentation • Allocation Note: issues not independent, but will cover separately ICS214B Notes 11 34 Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Motivation: Two sites: Sa, Sb Qa  Sa Sb  Qb ICS214B Notes 11 35 # NM Loc Sal Joe Sally Tom .. 5 7 8 Sa 10 Sb 25 Sa 15 .. E F # NM Loc Sal # NM Loc Sal 5 8 7 Sb 25 At Sb At Sa ICS214B Sally .. Sa 10 Sa 15 .. Joe Tom Notes 11 36 F = { F1, F2 } F1 =  loc=Sa(E) F2 =  loc=Sb(E)  called primary horizontal fragmentation ICS214B Notes 11 37 Fragmentation • Horizontal R Primary depends on local attributes Derived depends on foreign relation • Vertical R ICS214B Notes 11 38 Three common horizontal fragmentation techniques • Round robin • Hash partitioning • Range partitioning Used mostly in parallel dbs Used in parallel dbs and distributed dbs ICS214B Notes 11 39 • Round robin R t1 t2 t3 t4 ... D0 t1 D1 t2 t4 D2 t3 t5 • Evenly distributes data • Good for scanning full relation • Not good for point or range queries • Not suitable for databases distributed over WAN ICS214B Notes 11 40 • Hash partitioning R t1h(k1)=2 t2h(k2)=0 t3h(k3)=0 t4h(k4)=1 ... D0 D1 t2 t3 D2 t1 t4 • Good for point queries on key; also for joins on key • Not good for range queries; point queries not on key • If hash function good, even distribution • Not suitable for databases distributed over a WAN ICS214B Notes 11 41 • Range partitioning R t1: t2: t3: t4: ... A=5 A=8 A=2 A=3 D0 partitionin g vector 4 7 t3 t4 V0 V1 D1 t1 D2 t2 • Good for point queries on A; also for joins on A • Good for some range queries on A • Need to select good vector: else unbalanced • data skew, execution skew ICS214B Notes 11 42 Which are good fragmentations? Example: F = { F1, F2 } F1 =  sal<10  ICS214B E F2 =  sal>20 E Problem: Some tuples lost! Notes 11 43 Which are good fragmentations? Second example: F = { F3, F4 } F3 =   sal<10 E F4 =  sal>5 E Tuples with 5 < sal < 10 are duplicated... ICS214B Notes 11 44 Better design Example:  F = F5 = 7 F = { F5, F6, F7 } sal  5 E sal  10 E F6 =  5<sal<10 E  Then replicate F6 if convenient (part of allocation problem) ICS214B Notes 11 45 Desired properties for fragmentation R  F = {F1, F2, …, Fn} • Completeness – For every data item x  R,  FiF such that xFi • Disjointness – xFi,  Fj such that xFj, i  j • Reconstruction – There is function g such that R = g(F1, F2, …, Fn) ICS214B Notes 11 46 Desired properties for horizontal fragmentation R  F = {F1, F2, …, Fn} • Completeness – For every tuple tR,  FiF such that tFi • Disjointness – tFi,  Fj such that tFj, i  j • Reconstruction – can safely ignore – Completeness  R =  Fi FiF ICS214B Notes 11 47 How do we get completeness and disjointness? (1) Check it “manually”! e.g., F1 = ICS214B sal<10 E ; F2 = sal10 E Notes 11 48 How do we get completeness and disjointness? (2) “Automatically” generate fragments with these properties • Horizontal fragments are defined by selection predicates • Generate a set of selection predicates with the desired properties ICS214B Notes 11 49 Example of generation • Say queries use predicates: A<10, A>5, Loc = SA, Loc = SB • Next: - generate “minterm” predicates - eliminate useless ones • Given simple predicates Pr= { p1, p2,.. pn } minterm predicates are of the form p1*  p2*  …  pn* where pk* is pk or is ¬pk ICS214B Notes 11 50 5 < A < 10 Minterm predicates (part I) (1) (2) (3) (4) (5) (6) (7) (8) A<10 A<10 A<10 A<10 A<10 A<10 A<10 A<10  A>5  Loc=SA  Loc=SB  A>5  Loc=SA  ¬(Loc=SB)  A>5  ¬(Loc=SA)  Loc=SB  A>5  ¬(Loc=SA)  ¬(Loc=SB)  ¬(A>5)  Loc=SA  Loc=SB  ¬(A>5)  Loc=SA  ¬(Loc=SB)  ¬(A>5)  ¬(Loc=SA)  Loc=SB  ¬(A>5)  ¬(Loc=SA)  ¬(Loc=SB) A5 ICS214B Notes 11 51 Minterm predicates (part II) (9) (10) (11) (12) (13) (14) (15) (16) ¬(A<10)  A>5  Loc=SA  Loc=SB ¬(A<10)  A>5  Loc=SA ¬(Loc=SB) ¬(A<10)  A>5 ¬(Loc=SA)  Loc=SB ¬(A<10)  A>5 ¬(Loc=SA) ¬(Loc=SB) ¬(A<10) ¬(A>5)  Loc=SA  Loc=SB ¬(A<10) ¬(A>5)  Loc=SA ¬(Loc=SB) ¬(A<10) ¬(A>5) ¬(Loc=SA)  Loc=SB ¬(A<10) ¬(A>5) ¬(Loc=SA) ¬(Loc=SB) A  10 ICS214B Notes 11 52 Final fragments: F2: F3: F6: F7: F10: F11: 5 < A < 10 5 < A < 10 A5 A5 A  10 A  10 ICS214B       Loc=SA Loc=SB Loc=SA Loc=SB Loc=SA Loc=SB Notes 11 53 Note: elimination of useless fragments depends on application semantics: e.g.: if LOC could be  SA,  SB, we need to add fragments F4: 5 <A <10  Loc  SA  Loc  SB F8: A  5  Loc  SA  Loc  SB F12: A  10  Loc  SA  Loc  SB ICS214B Notes 11 54 Why does this algorithm work? • Must prove that the set of fragments is: – Complete – Disjoint ICS214B Notes 11 55 Summary • Given simple predicates Pr= { p1, p2,.. pn } minterm predicates are M={m | m =  p *, p P k k r 1  k n } where pk* is pk or is ¬ pk • Fragments m R for all m  M are complete and disjoint ICS214B Notes 11 56 Distributed commit problem . Transaction T Action: a1,a2 Action: a3 Action: a4,a5 Commit must be atomic ICS214B Notes 11 57 Distributed commit problem • Commit must be atomic – – – – site failures communication failures network partitions timeout failures • Solution: Atomic commit protocol – must ensure that despite failures, if all failures repaired, then transactions commits or aborts at all sites. • Most common ACP: Two-phase commit (2PC) – – – – ICS214B Centralized 2PC Distributed 2PC Linear 2PC Many other variants… Notes 11 58 Terminology • Resource Managers (RMs) – Usually databases • Participants – RMs that did work on behalf of transaction • Coordinator – Component that runs two-phase commit on behalf of transaction ICS214B Notes 11 59 PREPARED* COMMIT* DONE ICS214B Notes 11 Participant Coordinator REQUEST-TO-PREPARE 60 NO Participant Coordinator REQUEST-TO-PREPARE ABORT DONE ICS214B Notes 11 61 States of the Transaction • At Coordinator: – – – – Initiated (I) -- transaction known to system Preparing (P) -- prepare message sent to participants committed (C) -- has committed Aborted (A) -- has aborted • At participant: – Initiated (I) – Prepared (P) -- prepared to commit, if the coordinator so desires – committed (C) – Aborted (A) ICS214B Notes 11 62 Protocol Database • Coordinator maintains a protocol database (in main memory) for each transaction • Protocol database – enables coordinator to execute 2PC – answers inquiries by participants about status of transaction  cohorts may make such inquiries if they fail during recovery – entry for transaction deleted when coordinator is sure that no one will ever inquire about transaction again (when it has been acked by all the participants) Notes 11 ICS214B 63 two-phase commit (messages) Coordinator Participant I I commit-request request-prepare* prepared* Commit* ICS214B P no abort* A ack* C ack* F request-prepare prepared P commit ack Notes 11 C request-prepare no abort ack A 64 • Notation: Incoming message Outgoing message ( * = everyone) • When participant enters “P” state: – it must have acquired all resources – it can only abort or commit if so instructed by a coordinator • Coordinator only enters “C” state if all participants are in “P”, i.e., it is certain that all will eventually commit ICS214B Notes 11 65 Two phase commit -- normal actions (coordinator) – make entry into protocol database for transaction marking its status as initiated when coordinator first learns about transaction – Add participant to the cohort list in protocol database when coordinator learns about the cohorts – Change status of transaction to preparing before sending prepare message. (it is assumed that coordinator will know about all the participants before this step) – On receipt of PREPARE message from cohort, mark cohort as PREPARED. If all cohorts PREPARED, then change status to COMMITTED and send COMMIT message.  must force a commit log record to disk before sending commit message. – on receipt of ACK message from cohort, mark cohort as ACKED. When all cohorts have acked, then delete entry of transaction from protocol database.  Must write a completed log record to disk before deletion from ICS214B 66 protocol database. No Notes need11 to force the write though. Two Phase Commit - normal actions (participant) • On receipt of PREPARE message, write PREPARED log record before sending PREPARED message – needs to be forced to disk since coordinator may now commit. • On receipt of COMMIT message, write COMMIT log record before sending ACK to coordinator – cohort must ensure log forced to disk before sending ack -- but no great urgency for doing so. ICS214B Notes 11 67 Timeout actions • • • At various stages of protocol, transaction waits from messages at both coordinator and participants. If message not received, on timeout, timeout action is executed: Coordinator Timeout Actions – waiting for votes of participants: ABORT transaction, send aborts to all. – waiting for ack from some participant: forward the transaction to recovery process that periodically will send COMMIT to participant. When participant will recover, and all participants send an ACK, coordinator writes a completion log record and deletes entry from protocol database. • Cohort timeout actions: – waiting for prepare: abort the transaction, send abort message to coordinator. Alternatively, it could wait for the coordinator to ask for prepare. – Waiting for decision: forward transaction to recovery process. Recovery process executes status-transaction call to the coordinator. Such a transaction is blocked for recovery of failure. The participant could have used a different termination protocol -- e.g., polling other participants. ICS214B (cooperative Termination) Notes 11 68 2PC is blocking Sample scenario: Coord P2 W P1 P3 W P4 ICS214B Notes 11 W 69 Case I: P1  “W”; coordinator sent commits coord P1  “C” w Case II: P1 w P1  NO; P1  A w P2 P3 P4  P2, P3, P4 (surviving participants) cannot safely abort or commit transaction ICS214B Notes 11 70 Recovery Actions (cohort) • All sites execute REDO-UNDO pass • Detection: A site knows it is a cohort if it finds a prepared log record for a transaction • If the log does not contain a commit log record: – reacquire all locks for the transaction – ask coordinator for the status of transaction • If log contains a commit log record – do nothing ICS214B Notes 11 71 Recovery Action (coordinator) • • If protocol database was made fault-tolerant by logging every change, simply reconstruct the protocol database and restart 2PC from the point of failure. However, since we have only logged the commit and completion transitions and nothing else: – if the log does not contain a commit. Simply abort the transaction. If a cohort asks for status in the future, its status is not in the protocol database and it will be considered as aborted. – If commit log record, but no completion log record,  recreate transactions entry committed in the protocol database and the recovery process will ask all the participants if they are still waiting for a commit message. If no one is waiting, the completion entry will be written. – If commit log record + completion log record  do nothing. ICS214B Notes 11 72 Variants of 2PC • Linear ok ok ok Coord commit commit commit • Hierarchical ICS214B Notes 11 74 Variants of 2PC • Distributed – Nodes broadcast all messages – Every node knows when to commit ICS214B Notes 11 75 Cooperative Termination Protocol • Bad case – Participant P recovers from failure – Has prepared record for transaction T – No commit or abort record for T – Coordinator is down • Participant P is blocked until coordinator recovers ICS214B Notes 11 76 Cooperative termination protocol • But perhaps some other participant can help? • Requires participants “know” each other! ICS214B Notes 11 77 Cooperative Termination Protocol • Participant P sends a DECISIONREQUEST message to other participants • Alive participants respond with COMMIT, ABORT, or UNCERTAIN • If any participant replies with a decision (COMMIT or ABORT), P acts on decision – And sends decision to UNCERTAIN participants ICS214B Notes 11 78 Cooperative Termination Protocol • When P receives a DECISION-REQUEST – If it knows decision, responds with COMMIT or ABORT – If it has not prepared transaction, responds ABORT – If it is prepared but does not know decision, responds UNCERTAIN ICS214B Notes 11 79 Cooperative Termination Sample scenario: Coord P1 C P2 W P3 ICS214B Notes 11 W 80 Cooperative Termination Sample scenario: Coord P1 W P2 W P3 ICS214B Notes 11 A 81 Cooperative Termination Sample scenario: Coord P1 W P2 W P3 ICS214B Notes 11 W 82 Is there a non-blocking protocol? Theorem: If communications failure or total site failures (i.e., all sites are down simultaneously) are possible, then every atomic protocol may cause processes to become blocked. Two exceptions: if we ignore communication failures, it is possible to design such a protocol (Skeen et. al. 83) If we impose some restrictions on transactions (I.e., what data they can read/write) such a protocol can also be designed (Mehrotra et. al. 92) ICS214B Notes 11 83 Next… • Three-phase commit (3PC) – Nonblocking if reliable network (no communications failure) and no total site failures – Handling communications failures ICS214B Notes 11 84 Why 2PC blocks? • Since operational site on timeout in prepare state does not know if the failed site(s) had committed or aborted the transaction. • Polling all operational sites does not work since all the operational sites might be in doubt. ICS214B Notes 11 85 Approach to Making ACP Non-blocking • • • For a given state S of a transaction T in the ACP, let the concurrency set of S be the set of states that other sites could be in. For example, in 2PC, the concurrency set of PREPARE state is {PREPARE, ABORT, COMMIT} We develop non-blocking protocol, we will – ensures that concurrency set of a transaction does not contain both a commit and an abort – There exists no non-committable state whose concurrency set contains a commit. A state is committable if occupancy of the state by any site implies everyone has voted to commit the transaction. • • Necessity of these conditions illustrated by considering a situation with only 1 site operational. If either of the above violated, there will be blocking. Sufficiency illustrated by designing a termination protocol that will terminate the protocol correctly if the above assumptions hold. ICS214B Notes 11 86 REQUEST-TO-PREPARE PRECOMMIT ACK COMMIT Participant Coordinator PREPARED DONE ICS214B Notes 11 90 NO Participant Coordinator REQUEST-TO-PREPARE ABORT DONE ICS214B Notes 11 91 Coordinator Log start-3PC record (participant list) Participant REQUEST-PREPARE PREPARED Log prepared record (state W) PRECOMMIT ACK Log commit record (state C) COMMIT Log committed record (state C) ICS214B Notes 11 92 Coordinator Participant REQUEST-PREPARE 1. Timeout: Abort PREPARED PRECOMMIT 2. Timeout: ignore 2. Timeout Termination Protocol ACK COMMIT ICS214B 1. Timeout: abort Notes 11 3. Timeout Termination Protocol 93 Process categories • Three categories – Operational  Process has been up since start of 3PC – Failed  Process has halted since start of 3PC, or is recovering – Recovered  Process that failed and has completed recovery ICS214B Notes 11 94 Three Phase Commit Termination Protocol • Choose a backup coordinator from the remaining operational sites. • Backup coordinator sends messages to other operational sites to make transition to its local state (or to find out that such a transition is not feasible) and waits for response. • Based on response as well as its local state, it continues to commit or abort the transaction. • It commits, if its concurrency set includes a commit state. Else, it aborts. ICS214B Notes 11 95 Termination Protocol Start 3PC Coordinator fails Decision reached All sites learn decision • Only operational processes participate in termination protocol. • Recovered processes wait until decision is reached and then learn decision ICS214B Notes 11 96 Coordinator Participant REQUEST-PREPARE Abortable (A) PREPARED PRECOMMIT ACK Uncertain (U) Precommitted (PC) COMMIT Committed (C) ICS214B Notes 11 97 Termination Protocol • Elect new coordinator – Use Election Protocol (coming soon…) • New coordinator sends STATE-REQUEST to participants • Makes decision using termination rules • Communicates to participants ICS214B Notes 11 98 STATE-REQUEST* ICS214B ABORT* Notes 11 Participant Coordinator ABORTABLE 99 STATE-REQUEST* ICS214B COMMIT* Notes 11 Participant Coordinator COMMITTED 100 STATE-REQUEST* ICS214B ABORT* Notes 11 Participant Coordinator UNCERTAIN* 101 STATE-REQUEST* PRECOMMIT* ACK* Participant Coordinator PRECOMMITTED, NO COMMITTED COMMIT* ICS214B Notes 11 102 Termination Protocol Sample scenario: Coord P1 W P2 W P3 ICS214B Notes 11 W 103 Termination Protocol Sample scenario: Coord P1 W P2 W P3 ICS214B Notes 11 PC 104 Note: 3PC unsafe with communication failures! W P W P W abort ICS214B commit Notes 11 105 • After coordinator receives DONE message, it can forget about the transaction – E.g., cleanup control structures ICS214B Notes 11 106

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document