Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Database model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Commitment ordering wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
CSIS0402 System Architecture Distributed Computing - Middleware Technology (1) K.P. Chow University of Hong Kong Early Days Mainframe terminal networks Connecting “dumb” green-screen terminals to mainframe Terminal identified by line id Distributed network Each machine can have many connections (TCP/IP port no.) Each machine is identified by its IP address Programs Terminal handler Terminals Programs Network interface Distributed Systems First distributed system: ARPANET – 1969: 4-node network – 1972: about 50 computers Communication between companies in the same industry: – SWIFT (Society for Worldwide Interbank Financial Telecommunication) for international money transfers in the financial industry – IATA (International Air Transport Association) in the airline industry Vendor specific network architecture (1970s) – – – – – IBM’s System Network Architecture (SNA) Sperry’s Distributed Computing Architecture (DCA) Burroughs’ Network Architecture (BNA) DEC’s Distributed Network Architecture (DNA) Basic services: file transfer, remote printing, terminal transfer, remote file access – Distributed applications: e-mail Open Systems Interconnection (OSI) Force IT vendors to implement the same standards, e.g. RS232 and X.25 Use International Organization for Standardization (ISO) as the standards authority Based on OSI (Open Systems Interconnection) series of standards – OSI Basic Reference Model: seven-layer model Protocol between each layer Sender Flow of data Application Application Presentation Presentation Session Session Transport Network Transport Network Data Link Data Link Physical Physical Receiver Flow of data TCP/IP Open Systems Standards – Too slow – Standards are too complex, e.g. OSI virtual terminal standard Unix – – – – Open systems Cheap and portable Many versions: Berkeley version and AT&T version Network support: TCP/IP TCP/IP Session TCP UDP IP Ethernet – Internet Protocol (IP): network standard – Transmission Control Protocol (TCP): session standard for program-to-program communication over IP – Provides application programming interfaces (APIs) for sending messages/data over the network: Sockets – Services: telnet, snmp, ftp Transport Network ATM Data Link Why Middleware? When building distributed systems, need to build the middleware: Easy of use: compare to Sockets Location transparency: – Application should not need to know the network and application address – Possible to move an application from a machine to another machine with different network address without recompilation Message delivery integrity: messages should not be lost or duplicated Message content integrity: messages should not be corrupted Language transparency: a program using middleware should be able to communicate with another program written in a different language Support client/server model: server provides a service for the client Traditional Middleware Models Remote Procedure Call (RPC) Remote data access Transaction processing Distributed transaction processing Message queue Remote Procedure Call (RPC) Access a remote service through remote procedure calls: the syntax in the client (the caller) and the server (the called) remains the same as if they were on the same machine Known RPC mechanism: – Open Network Computing (ONC) from SUN – Distributed Computing Environment (DCE) from Open Software Foundation (OSF) Price = getPrice(Product); Interface Definition Language (IDL) Skeleton Stub Client Server Price = getPrice(Product); Compilation Environment Remote Procedure Calls (RPC) Stub and skeletons – Generated by IDL compiler – Small chunks of C code that are compiled and linked into the client and server programs Stub: converts parameters into a string of bits and sends the message over the network Skeleton: takes the messages and converts it back to parameters, and calls the server program Marshalling: handles different data format, e.g. byte ordering Call getPrice(string Product) return float // price Parameters marshalled into a string of bits Client 010011010110011110001010 Server getPrice(string Product) return float // price RPC and Multi-threading Using RPC – A client is blocked when it calls a remote procedure – Client may be left waiting due to message lost, server too slow, or server halted – Server is also blocked when process a request Multi-thread – Allow client to read from the keyboard or the mouse while waiting for remote procedure call returns – Allow server to handle many clients simultaneously, e.g. 1000 threads to handle 1000 clients – May have synchronization problem during resources sharing: need to use locks or semaphores, e.g. withdraw money from the same account from 2 different ATMs – Difficult to test and debug multi-thread program Remote Database Access Provides the ability to read or write to a database that is physically on a different machine from the client program Vendor specific, e.g. Oracle’s Oracle Transparent gateway, IBM’s DRDA ODBC (Open Database Connectivity) – – – – Remote database access provided by Microsoft Standard programming interface (only) for Windows, not middleware Vendors will have to provides the client and server middleware ODBC client software that is written to communicate remotely to an Oracle database will not communicate to an IBM database Remote Database Access – Microsoft’s Approach Move away from ODBC, use OLE DB instead OLE DB uses COMobjects, rather complex ADO (Active Data Objects): another standard by Microsoft – A simpler COMRemote DB access compliant interface – Uses OLE DB, and OLE DB uses an ODBC data provider Host DB interface Database Application or Tool ADO OLE DB ODBC Remote DB Access – SQL Parsing Parse step - turns the SQL commands into a query plan: Client – – – – Define tables that will be accessed List of indexes to be used Filtered by which expression Define the output Execute query – parameters will be provided – Output can be any length, e.g. a million rows, large overhead on network Optimization: can be achieved using stored procedure Good for ad-hoc queries Not good for transaction processing Server SQL Text Parse Query Query Output Description Execute command plus parameters Execute Query Output Data Query Plan Transaction Processing Transaction – A transaction is a sequence of operations that are carried out together and form a single unit – Must be either end up completed (committed) or is completely undone – E.g. withdraw money from a bank account: write a record of the debit update the account total update the bank teller record Early transaction processing system: – Handled by transaction monitor, e.g. CICS on IBM MVS, TIP on Unisys 2200, COMS on Unisys A Series – On a single machine (mainframe) Distributed transaction processing: databases were on different machines How important is transaction? – Business involves transaction, not customer, e.g. Give the bank a check to settle the credit card bill (atomic action) The bank has a complex business process to ensure the each payment is settled (transaction) Transaction Characteristics – ACID Properties A for atomic: the transaction is never half done, if there is an error, it is completely undone C for consistent: the transaction changes the DB from one consistent state to another consistent state, i.e. – The DB data integrity constraints hold true – The DB need not be consistent within the transaction – Includes explicit data integrity (e.g. product codes must be between 8 to 10 digits long) and internal integrity constraints (e.g. all index entries must point at valid records) I for isolation: the data updates within a transaction are not visible to other transaction until the transaction is completed D for durable: when a transaction is done, it really is done and the updates will not disappear at sometime in the future Transactional Architecture Local transactions: all operations are carried out on the same database engine Distributed transactions: involve multiple database servers, which are usually deployed on different hosts, requires a transaction manager to co-ordinate the processing and managed by a 2-phase commit process Global transactions: a distributed transaction that encompasses not only different servers, but different application, e.g. EJB application that sends messages to a legacy system using JMS Transaction demarcation: – Transaction demarcation is defined as the specification of transaction boundaries: indicating when a transaction begins and ends – The execution of a program “crosses a transactional boundary” whenever there is a change in transaction context – A transaction context is the association of a transaction with an application component Classic Example Transfer money from a savings account to a current account, one of the two followings should occur: – Money is withdrawn from savings account and deposited into the current account – Money is not taken out from the savings account and is not deposited into the current account Classic implementation: begin_transaction() withdraw_from_savings() deposit_to_current() commit_transaction() With one DB, the DBMS can serve as the transaction coordinator With multiple DBs, we need distributed transaction coordinator to coordinate the processing using 2-phase commit Transaction using Objects Class SavingsAccount with the following withdraw method that can be reused: Class CurrentAccount with the following deposit method that can be reused: withdraw(amount) { begin_transaction(); … withdraw_money; … end_transaction(); } deposit(amount) { begin_transaction(); … deposit_money; … end_transaction(); } Transaction involves multiple objects Implement the transfer operation: transfer(savingsAccount, currentAccount, amount) { savingsAccount.withdraw(amount); currentAccount.deposit(amount); } Problem: – savingsAccount.withdraw(amount) and currentAccount.deposit(amount) are in different transactions Will the following work? Remove the transaction boundaries from withdraw and deposit Add the transaction boundary to the transfer: transfer(savingsAccount, currentAccount, amount) { begin_transaction(); savingsAccount.withdraw(amount); currentAccount.deposit(amount); end_transaction(); } Problem: withdraw() and deposit() are no longer transactionally protected withdraw(amount) { … withdraw_money; … } deposit(amount) { … deposit_money; … } Automatic Transaction Boundary Management No transaction boundary management code in the business method: transfer(savingsAccount, currentAccount, amount) { savingsAccount.withdraw(amount); currentAccount.deposit(amount); } Transactions are defined by the run-time environment using transaction attributes, e.g. Required Supported by modern transaction servers, e.g. MS MTS, J2EE withdraw(amount) { … withdraw_money; … } deposit(amount) { … deposit_money; … } Transaction Attributes TransferSession transfer(savingsAccount,currentAccount,amount) RequiresNew { savingsAccount.withdraw(amount); currentAccount.deposit(amount); } SavingsAccount withdraw(amount) Required { … withdraw_money; … } CurrentAccount deposit(amount) Required { … deposit_money; … } Transactional Context Required 4.Invoke withdraw() SavingsAccount 1.Invoke transfer() withdraw( ) RequiresNew TranferSession transfer( ) 2.Register TransferSession 7.Invoke deposit() 5.Register SavingsAccount CurrentAccount deposit( ) 8.Register CurrentAccount Transaction Manager 6.Add SavingAccount 3.Create a new Transactional Context and add TransferSession TransferSession 9.Add CurrentAccount SavingsAccount CurrentAccount Transaction Context X Required Transactional Context (cont) Required 4.Invoke withdraw() SavingsAccount 1.Invoke transfer() withdraw( ) RequiresNew TranferSession transfer( ) 2. 5. 7.Invoke deposit() 8 CurrentAccount deposit( ) Transaction Manager 3. 6. TransferSession SavingsAccount Transaction Context X CurrentAccount 9. 10. Check TxCtx X to make sure that updates to each will work Required 11. Commit all updates or roll back all updates Transaction Isolation Level Isolation: a transaction should be ignorance of the existence of any other transactions – If a second transaction attempts to read the data being modified by the first transaction, that second transaction will not see any changes the first transaction has made E.g. if my wife read a joint saving account balance at the same time I was transferring money from saving to current, the following events may occur: – – – – – I see my saving account balance $1000 I initiate a transfer of $500 from saving to current The system debits my saving account My wife requests the balance for the saving account and sees $1000 The system credits my current account Full transaction isolation: a transaction act on the DB as if it were the only thread operating on the DB – Unable to support high concurrency requirement in real systems Isolation Levels To achieve better concurrency ANSI SQL identifies 4 distinct transaction isolation levels – To balance performance needs with data integrity needs Isolation conditions: – Dirty read: a transaction A views the uncommitted changes of another transaction B, if transaction B rolls back its change, transaction A is said to have “dirty” data – Nonrepeatable read: a transaction A reads different data from the same query when it is issued multiple times and other transactions have changed the rows between reads by transaction A. A transaction that mandates repeatable reads will not see the committed changes made by other transactions. – Phantom read: a phantom reads deals with changes in other transactions that would result in new rows matching the transaction’s WHERE clause. E.g. if transaction A reads all accounts with balance < 100 and A performs 2 reads, a phantom read allows for new rows to appear in second read based on changes made by other transactions ANSI SQL Transaction Isolation Levels Read uncommitted transactions: allows dirty reads, nonrepeatable reads, and phantom reads Read committed transactions: only data committed to the DB may be read, can perform nonrepeatable and phantom reads Repeatable read transactions: committed, repeatable reads as well as phantom reads are allowed, nonrepeatable reads are not allowed Serializable transactions: only committed, repeatable reads are allowed, phantom reads are not allowed The support of different isolation levels depends on the database engine Distributed Transaction Processing Steps Client 1. The client tells the middleware that a transaction begins Begin Transaction Server A 2. The client calls server A 3. Server A updates the database 4. The client calls the server B Update DB on A 5. Server B updates its database Server B 6. The client tells the middleware that the transaction ends If the updates to 2nd DB failed (5), updates to 1st DB are rolled back To maintain ACID properties, all locks acquired by DB cannot be released until end of transaction (6) Update DB on B Commit 2 phase commit X/Open A consortium to establish the standards for distributed transaction processing (now called Open Group) Standard protocol between the middleware and the database is called XA protocol XA compliant: a DB that can co-operate with X/Open DTP middleware in a two-phase commit protocol X/Open standard for client/server contains 3 protocols: – Protocol based on SNA LU6.2 (IBM): peer-to-peer protocol with no marshalling – Protocol based on DCE’s remote procedure calls (from Encina): parameter marshalling and threads are blocked during a call – XATMI protocol based on ATMI (from Tuxedo): uses message format called View Buffers, supports both RPC-like calls and unblocked calls Message Queuing Program-to-queue communication Message queue: like a very fast mail box, i.e. put a message in a box without the recipient being active Actions: – Put: puts a message into the queue – Get: takes a message out of the queue Message queue software: – Transfer of messages from queue to queue – Ensures message arrives eventually and only one copy of the message is placed in the destination queue Get Put Put Queue Queue Get Message Queue Software Queue have names The queues are independent of program, i.e. many programs can perform Puts and Gets on the same queue, and a program can access many queues If the network goes down, messages will wait in the queue until the network comes up The queues can be put onto a disk so that the queue is not lost even the system goes down The queue can be a resource manager and co-operate with a transaction manager, i.e. if the message is put in a queue during a transaction and the transaction later aborted, then the DB will be rolled back and the message is removed from the queue Some message queue systems can cross networks of different types, e.g. message goes through SNA leg, TCP/IP leg and then a Novell IPX leg Efficient: used in applications require sub-second response time Message Queue Software Products: IBM MQSeries, Microsoft MSMQ, BEA Systems Tuxedo/Q Disadvantage: no parameter marshalling, up to the sender and the receiver to interpret the data Peer-to-peer instead of client/server MQ vs. DTP – DTP solution Machine X Example: transfer money from account A to account B using DTP Actions “Debit Account A” and “Credit Account B” are done in one distributed transaction Any failure aborts the whole transaction Disadvantages: Performance degrade due to two-phase commit If either of the system is down or network is down, the transaction cannot take place Machine Y Start Transaction Debit Account A Initiate Partner Transaction Start Transaction Credit Account B 2 phase commit MQ vs. DTP – MQ solution Machine X Message queue solution to the same problem Message is not allowed to reach machine Y until the first transaction has committed (make sure 1st transaction is not aborted) Flaw (not atomic): If the destination transaction fails, money is taken out of A and disappears Machine Y Start Transaction Debit Account A Send Message Start Transaction Commit Read Transaction Credit Account B Commit MQ vs. DTP – MQ with Reversal Machine X Reversal transaction in case transaction cannot be completed on Machine Y Flaw (not isolated): Fails if account A is deleted before the reversal takes effect, i.e. action “Credit Account A” cannot be processed Machine Y Start Transaction Debit Account A Send Message Start Transaction Commit Read Message FAIL !! Practical solution: Wait for a complaint and do manual adjustment Start Transaction Read Message Credit Account A Commit Send Message What will happen to these technologies? Technology RPC Replacement Component middleware Distributed transaction middleware Component middleware MSMQ Provides a component interface to message queue and supports sending objects MQSeries Will exist Remote data access Will exist to solve simple problems that require quick solution