Download CHAP11

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Serializability wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Distributed Databases
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-1
Definitions
• Distributed Database: A single logical
database that is spread physically across
computers in multiple locations that are
connected by a data communications link.
• Decentralized Database: A collection of
independent databases on non-networked
computers.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-2
Reasons for
Distributed Database
• Local business units want control over data.
• Consolidate data across local databases for
integrated decision making.
• Reduce telecommunications costs.
• Reduce the risk of telecommunications
failures.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-3
Distributed Database Options
• Fig. 11-1. (Slide 11-5)
• Homogeneous - Same DBMS at each node.
– Autonomous - Independent DBMSs.
– Non-autonomous - Central , coordinating
DBMS.
• Heterogeneous - Different DBMSs at
different nodes.
– Gateways - Simple paths are created to other
databases without the benefits of one logical
database.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-4
Distributed database environments (adapted from Bell and
Grimson, 1992)
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-5
Distributed Database Options
– Systems - Supports some or all of the
functionality of one logical database.
• Full DBMS Functionality - All dist. Db functions.
• Partial-Multi-database - Some dist. Db functions.
– Federated - Supports local databases for unique data
requests.
» Loose Integration - Local dbs have their own
schemas.
» Tight Integration - Local dbs use common schema.
– Unfederated - Requires all access to go through a central,
coordinating module.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-6
Homogeneous, Non-Autonomous
Database
•
•
•
•
Fig. 11-2.
Data is distributed across all the nodes.
Same DBMS at each node.
All data is managed by the distributed
DBMS (no exclusively local data.)
• All access is through one, global schema.
• The global schema is the union of all the
local schema.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-7
Focus on The Following
Heterogeneous Environment
• Fig. 11-3.
• Data distributed across all the nodes.
• Different DBMSs may be used at each
node.
• Local access is done using the local DBMS
and schema.
• Remote access is done using the global
schema.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-8
Objectives and Trade-offs
• Location Transparency - User does not have
to know the location of the data.
• Local Autonomy - Local site can operate
with its database when central site is down.
• Synchronous Distributed Database - All
copies of the same data are always identical.
• Asynchronous Distributed Database - Some
data inconsistency is tolerated.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-9
Advantages of
Distributed Database
•
•
•
•
•
Increased reliability and availability.
Local control over data.
Modular growth.
Lower communication costs.
Faster response for certain queries.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-10
Disadvantages of
Distributed Database
•
•
•
•
Software cost and complexity.
Processing overhead.
Data integrity exposure.
Slower response for certain queries.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-11
Options for
Distributing a Database
•
•
•
•
Data replication.
Horizontal partitioning.
Vertical partitioning.
Combinations of the above.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-12
Data Replication
• Advantages – Reliability.
– Fast response.
– May avoid complicated distributed transaction
integrity routines (if replicated data is refreshed
at scheduled intervals.)
– De-couples nodes (transactions proceed even if
some nodes are down.)
– Reduced network traffic at prime time (if
updates can be delayed.)
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-13
Data Replication
• Disadvantages –
–
–
–
Additional requirements for storage space.
Additional time for update operations.
Complexity and cost of updating.
Integrity exposure of getting incorrect data if
replicated data is not updated simultaneously.
• Therefore, better when used for non-volatile
data.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-14
Types of Data Replication
• Snapshot Replication – Changes are periodically sent to a master site
which sends an updated snapshot out to the
other sites.
• Near Real-Time Replication – Broadcast update orders without requiring
confirmation.
• Pull Replication – Each site controls when it wants updates.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-15
Issues in Data Replication Use
• Data timeliness.
• Useful if DBMS cannot reference data from
more than one node.
• Batched updates can cause performance
problems.
• Updates complicated with heterogeneous
DBMSs or database design.
• Telecommunications speeds may limit mass
updates.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-16
Horizontal Partitioning
• Different records of a file at different sites.
• Advantages – Data stored close to where it is used.
– Local access optimization.
– Security.
• Disadvantages
– Accessing data across partitions.
– No data replication.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-17
Vertical Partitioning
• Different columns of a file at different sites.
• Advantages and disadvantages are the same
as for horizontal partitioning except that
combining data across partitions is more
difficult because it requires joins.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-18
Distributed processing system for a manufacturing company
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-19
Five Distributed Database
Organizations
Centralized database, distributed access.
Replication with periodic snapshot update.
Replication with near real-time
synchronization of updates.
Partitioned, one logical database.
Partitioned, independent, non-integrated
segments.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-20
Factors in Choice of
Distributed Strategy
•
•
•
•
•
•
•
Table 11-1.
Funding, autonomy, security.
Site data referencing patterns.
Growth and expansion needs.
Technological capabilities.
Costs of managing complex technologies.
Need for reliable service.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-21
Requirements for a
Distributed DBMS
• Ability to locate data with a distributed data
dictionary.
• Determine the location from which to
retrieve data and the location at which to
process each part of a distributed query.
• Heterogeneous DBMS translation.
• Security, concurrency, query optimization,
failure recovery.
• Consistency of replicated data.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-22
Distributed DBMS
Data Reference
• Local Transaction - references local data.
• Global Transaction - references non-local
data.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-23
Distributed DBMS architecture
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-24
Distributed DBMS
Transparency Objectives
• Location Transparency
• Replication Transparency
• Failure Transparency
– Either all or none of the actions of a transaction
are committed.
– Each site has a transaction manager.
• Logs transactions and before and after images.
• Concurrency control scheme to ensure data integrity.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-25
Distributed DBMS
Transparency Objectives
– Commit Protocol: Ensures that a global
transaction is either successfully completed at
each site or else aborted.
– Two-Phase Commit:
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-26
Two-Phase Commit
• Prepare Phase
– Coordinator receives a commit request.
– Coordinator instructs all resource managers to
get ready to “go either way” on the transaction.
Each resource manager writes all updates from
that transaction to its own physical log.
– Coordinator receives replies from all resource
managers. If all are ok, it writes commit to its
own log; if not then it writes rollback to its log.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-27
Two-Phase Commit
• Commit Phase
– Coordinator then informs each resource
manager of its decision and broadcasts a
message to either commit or rollback (abort.) If
the message is commit, then each resource
manager transfers the update from its log to its
database.
– A failure during the commit phase puts a
transaction “in limbo.” This has to be tested for
and handled with timeouts or polling.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-28
Distributed DBMS
Transparency Objectives
• Concurrency Transparency
– Design goal for distributed database
• Timestamping
– Concurrency control mechanism
– alternative to locks in distributed databases
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-29
Query Optimization
• In a query involving a multi-site join and, possibly, a
distributed database with replicated files, the distributed
DBMS must decide where to access the data and how to
proceed with the join. Three step process:
1 Query decomposition - rewritten and simplified
2 Data localization - query fragmented so that fragments
reference data at only one site.
3 Global optimization • Order in which to execute query fragments.
• Data movement between sites.
• Where parts of the query will be executed.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-30
Query Optimization
C.J. Date Example (table 11-2)
•
•
•
•
SUPPLIER (Supplier No., City) - 10K recs in Detroit
PART (Part No., Color) - 100K recs in Chicago
SHIPMENT (Supplier No., Part No.) - 1mil recs in Det.
10 red parts; 100K shipments from Cleveland
–
–
–
–
–
SELECT SUPPLIER.SUPPLIER_NO
FROM SUPPLIER, SHIPMENT, PART
WHERE SUPPLIER.CITY = ‘Cleveland’
AND SHIPMENT.PART_NO = PART.PART_NO
AND SHIPMENT.SUPPLIER_NO =
SUPPLIER.SUPPLIER_NO
– AND PART.COLOR = ‘Red’;
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-31
Evolution of Distributed DBMS
• “Unit of Work” - All of a transaction’s
instructions.
• Remote Unit of Work
– SQL statements originated at one location can
be executed as a single unit of work on a single
remote DBMS.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-32
Evolution of Distributed DBMS
• Distributed Unit of Work
– Different statements in a unit of work may refer
to different remote sites.
– All databases in a single SQL statement must
be at a single site.
• Distributed Request
– A single SQL statement may refer to tables in
more than one remote site.
– May not support replication transparency or
failure transparency.
Copyright © 1999 Addison Wesley Longman, Inc.
TM 11-33