Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Serializability wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Distributed Databases Copyright © 1999 Addison Wesley Longman, Inc. TM 11-1 Definitions • Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link. • Decentralized Database: A collection of independent databases on non-networked computers. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-2 Reasons for Distributed Database • Local business units want control over data. • Consolidate data across local databases for integrated decision making. • Reduce telecommunications costs. • Reduce the risk of telecommunications failures. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-3 Distributed Database Options • Fig. 11-1. (Slide 11-5) • Homogeneous - Same DBMS at each node. – Autonomous - Independent DBMSs. – Non-autonomous - Central , coordinating DBMS. • Heterogeneous - Different DBMSs at different nodes. – Gateways - Simple paths are created to other databases without the benefits of one logical database. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-4 Distributed database environments (adapted from Bell and Grimson, 1992) Copyright © 1999 Addison Wesley Longman, Inc. TM 11-5 Distributed Database Options – Systems - Supports some or all of the functionality of one logical database. • Full DBMS Functionality - All dist. Db functions. • Partial-Multi-database - Some dist. Db functions. – Federated - Supports local databases for unique data requests. » Loose Integration - Local dbs have their own schemas. » Tight Integration - Local dbs use common schema. – Unfederated - Requires all access to go through a central, coordinating module. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-6 Homogeneous, Non-Autonomous Database • • • • Fig. 11-2. Data is distributed across all the nodes. Same DBMS at each node. All data is managed by the distributed DBMS (no exclusively local data.) • All access is through one, global schema. • The global schema is the union of all the local schema. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-7 Focus on The Following Heterogeneous Environment • Fig. 11-3. • Data distributed across all the nodes. • Different DBMSs may be used at each node. • Local access is done using the local DBMS and schema. • Remote access is done using the global schema. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-8 Objectives and Trade-offs • Location Transparency - User does not have to know the location of the data. • Local Autonomy - Local site can operate with its database when central site is down. • Synchronous Distributed Database - All copies of the same data are always identical. • Asynchronous Distributed Database - Some data inconsistency is tolerated. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-9 Advantages of Distributed Database • • • • • Increased reliability and availability. Local control over data. Modular growth. Lower communication costs. Faster response for certain queries. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-10 Disadvantages of Distributed Database • • • • Software cost and complexity. Processing overhead. Data integrity exposure. Slower response for certain queries. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-11 Options for Distributing a Database • • • • Data replication. Horizontal partitioning. Vertical partitioning. Combinations of the above. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-12 Data Replication • Advantages – Reliability. – Fast response. – May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.) – De-couples nodes (transactions proceed even if some nodes are down.) – Reduced network traffic at prime time (if updates can be delayed.) Copyright © 1999 Addison Wesley Longman, Inc. TM 11-13 Data Replication • Disadvantages – – – – Additional requirements for storage space. Additional time for update operations. Complexity and cost of updating. Integrity exposure of getting incorrect data if replicated data is not updated simultaneously. • Therefore, better when used for non-volatile data. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-14 Types of Data Replication • Snapshot Replication – Changes are periodically sent to a master site which sends an updated snapshot out to the other sites. • Near Real-Time Replication – Broadcast update orders without requiring confirmation. • Pull Replication – Each site controls when it wants updates. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-15 Issues in Data Replication Use • Data timeliness. • Useful if DBMS cannot reference data from more than one node. • Batched updates can cause performance problems. • Updates complicated with heterogeneous DBMSs or database design. • Telecommunications speeds may limit mass updates. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-16 Horizontal Partitioning • Different records of a file at different sites. • Advantages – Data stored close to where it is used. – Local access optimization. – Security. • Disadvantages – Accessing data across partitions. – No data replication. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-17 Vertical Partitioning • Different columns of a file at different sites. • Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-18 Distributed processing system for a manufacturing company Copyright © 1999 Addison Wesley Longman, Inc. TM 11-19 Five Distributed Database Organizations Centralized database, distributed access. Replication with periodic snapshot update. Replication with near real-time synchronization of updates. Partitioned, one logical database. Partitioned, independent, non-integrated segments. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-20 Factors in Choice of Distributed Strategy • • • • • • • Table 11-1. Funding, autonomy, security. Site data referencing patterns. Growth and expansion needs. Technological capabilities. Costs of managing complex technologies. Need for reliable service. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-21 Requirements for a Distributed DBMS • Ability to locate data with a distributed data dictionary. • Determine the location from which to retrieve data and the location at which to process each part of a distributed query. • Heterogeneous DBMS translation. • Security, concurrency, query optimization, failure recovery. • Consistency of replicated data. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-22 Distributed DBMS Data Reference • Local Transaction - references local data. • Global Transaction - references non-local data. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-23 Distributed DBMS architecture Copyright © 1999 Addison Wesley Longman, Inc. TM 11-24 Distributed DBMS Transparency Objectives • Location Transparency • Replication Transparency • Failure Transparency – Either all or none of the actions of a transaction are committed. – Each site has a transaction manager. • Logs transactions and before and after images. • Concurrency control scheme to ensure data integrity. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-25 Distributed DBMS Transparency Objectives – Commit Protocol: Ensures that a global transaction is either successfully completed at each site or else aborted. – Two-Phase Commit: Copyright © 1999 Addison Wesley Longman, Inc. TM 11-26 Two-Phase Commit • Prepare Phase – Coordinator receives a commit request. – Coordinator instructs all resource managers to get ready to “go either way” on the transaction. Each resource manager writes all updates from that transaction to its own physical log. – Coordinator receives replies from all resource managers. If all are ok, it writes commit to its own log; if not then it writes rollback to its log. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-27 Two-Phase Commit • Commit Phase – Coordinator then informs each resource manager of its decision and broadcasts a message to either commit or rollback (abort.) If the message is commit, then each resource manager transfers the update from its log to its database. – A failure during the commit phase puts a transaction “in limbo.” This has to be tested for and handled with timeouts or polling. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-28 Distributed DBMS Transparency Objectives • Concurrency Transparency – Design goal for distributed database • Timestamping – Concurrency control mechanism – alternative to locks in distributed databases Copyright © 1999 Addison Wesley Longman, Inc. TM 11-29 Query Optimization • In a query involving a multi-site join and, possibly, a distributed database with replicated files, the distributed DBMS must decide where to access the data and how to proceed with the join. Three step process: 1 Query decomposition - rewritten and simplified 2 Data localization - query fragmented so that fragments reference data at only one site. 3 Global optimization • Order in which to execute query fragments. • Data movement between sites. • Where parts of the query will be executed. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-30 Query Optimization C.J. Date Example (table 11-2) • • • • SUPPLIER (Supplier No., City) - 10K recs in Detroit PART (Part No., Color) - 100K recs in Chicago SHIPMENT (Supplier No., Part No.) - 1mil recs in Det. 10 red parts; 100K shipments from Cleveland – – – – – SELECT SUPPLIER.SUPPLIER_NO FROM SUPPLIER, SHIPMENT, PART WHERE SUPPLIER.CITY = ‘Cleveland’ AND SHIPMENT.PART_NO = PART.PART_NO AND SHIPMENT.SUPPLIER_NO = SUPPLIER.SUPPLIER_NO – AND PART.COLOR = ‘Red’; Copyright © 1999 Addison Wesley Longman, Inc. TM 11-31 Evolution of Distributed DBMS • “Unit of Work” - All of a transaction’s instructions. • Remote Unit of Work – SQL statements originated at one location can be executed as a single unit of work on a single remote DBMS. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-32 Evolution of Distributed DBMS • Distributed Unit of Work – Different statements in a unit of work may refer to different remote sites. – All databases in a single SQL statement must be at a single site. • Distributed Request – A single SQL statement may refer to tables in more than one remote site. – May not support replication transparency or failure transparency. Copyright © 1999 Addison Wesley Longman, Inc. TM 11-33