Download View File

Distributed Databases RESOURCES: Chapter 13 (Hoffer) Chapter 10 (Rob Coronel) Outline  Options for distributing data    Replication Schemes Distributed DBMS Products C.J. Date’s Twelve Commandments Options for Distributing a Database  Four basic strategies     Data Replication Horizontal Partitioning Vertical Partitioning Combination of above Data Replication    Data distribution option that supports a separate copies of database stored at two or more sites  Full database copies or fragments of database can be kept at multiple sites Replication may use either synchronous or asynchronous technology Replication Types  Fully replicated DB   Partially replicated DB   Multiple copies of each database fragment stored at multiple sites Multiple copies of some database fragments are stored at multiple sites Unreplicated DB  Each database fragment stored at a single site – there are no duplicate fragments Data Replication Schemes  Different schemes exist for updating data copies    Snapshot Replication (Push strategy) Near Real-time Replication (Push strategy) Pull Replication Snapshot Replication(1)  Assume multiple sites are updating the same data  Case1- Full refresh of database  Updates from all replicated sites are periodically collected at a master or primary site – forming consolidated result of all changes  With some DDBMS, this list of changes is collected in a snapshot log - a table of row identifiers for records to go into snapshot  A read-only snapshot of the replicated portion of the database is taken at master site  Finally, Snapshot is sent to each site where there is a copy  Case2 - Differential or Incremental refresh  only those pages that have changed since the last snapshot, can be sent.  Snapshot log for each replicated table is joined with the associated base to form the set of changed rows to be sent to replicated sites Snapshot Replication(2)   Applications (for Decision Support Sytems, Datawarehousing and Data-mining) that do not require current data can use this replication scheme Cost of snapshot refresh depends on type of snapshot   Simple Snapshot  References all or portion of one table  Can be handled by differential refresh in some DDBMS Complex Snapshot  References multiple tables usually from transactions that involve joins  Can be handled by time consuming full refresh in some DDBMS Near Real-time Replication    store and forward messages for each completed transaction, can be broadcasted across the network These messages inform all nodes to update data as soon as possible without forcing a confirmation to the originating node One way to generate these messages is by using triggers   A trigger can be stored at each local database when a piece of replicated data is updated, trigger executes corresponding update commands against remote database replicas Pull Replication     This is a pull strategy The target, not the source node, controls when a local database is updated Local database determines when it needs to be refreshed For refresh, it requests a snapshot or the emptying of an update message queue Advantages and Disadvantages of Replication  Advantages       Reliability Fast Response Possible avoidance of complicated distributed transaction integrity routines Node decoupling Reduced network traffic Disadvantages   Storage requirements Complexity and cost of updating See pg 499 Replication suitability  Replication is suitable for   applications where most requests are read-only and data are static (e.g. catalogs, telephone directories, train schedules) Not suitable for    Online applications (e.g. airline reservation) ATM machine transactions Financial activity based applications See more on pg.501 – when to use replication Options for Distributing a Database  Four basic strategies     Data Replication Horizontal Partitioning Vertical Partitioning Combination of above Horizontal Partitioning   Data is distributed across different sites based on one or more primary keys. Rows of a relation are distributed to many sites     Rows of a table/relation are divided into subsets of rows with each subset/fragment placed at a different node Considering example of Bank, data specific to each branch can be placed at the branch server When a customer conducts transaction at home branch, it is processed locally in minimum time When transaction is conducted at another branch, it can be sent for processing to home branch and response can be sent back to initiating branch Horizontal Partitioning   Used when an organization’s function is distributed, but each site is concerned with only subset of data For example, branch offices in an organization deal mostly with a set of local customers and the related customer data need not be accessed by other branch offices. Advantages of Horizontal Partitioning  Efficiency   Local optimization   optimized performance for local access Security   data is stored where it is used data irrelevant to a particular site is not made available Ease of querying  its easy to combine rows by unions across horizontal partitions Disadvantages of Horizontal Partitioning  Inconsistent access speed   Accessing data from several partitions can be significantly different from local-only data access Backup vulnerability   Since data are not replicated, when data at one site becomes damaged or inaccessible, there is no alternative site for that data Data may be lost if proper backups are not performed Options for Distributing a Database  Four basic strategies     Data Replication Horizontal Partitioning Vertical Partitioning Combination of above Vertical Partitioning  Data is split by columns across multiple systems. The primary key is replicated at each site. Hence, columns of a relation are distributed to many sites    Columns are divided into subsets of columns with each subset/fragment having unique columns except for the key column. The relations at each site must share a common domain for original table reconstruction Considering example of a bank, the bank can have customer data    data columns specific to personal information can become one subset, the columns related to financial and account status can become another subset PK – Account no. will be present in both subsets Advantages and Disadvantages of Vertical Partitioning  Same as those for horizontal partitioning with the exception that   Combining data across vertical partitions is more difficult than that across horizontal partitions Reason – the need to match PKs or other qualifications to join rows across partitions Options for Distributing a Database  Four basic strategies     Data Replication Horizontal Partitioning Vertical Partitioning Combination of above Combination of Operations     There are unlimited combinations of the preceding strategies. Some data may be centrally stored, while other data are replicated at various sites. For a given relation both horizontal and vertical partitions may be desirable for data distribution. Figure (next slide) shows that    Engg. Parts, Accounting and Customer data are each centralized at different locations Standard parts data are partitioned horizontally among the three locations Price List is replicated at all the three locations Combination of Operations RegionA RegionB RegionA parts RegionB parts RegionC Region C parts Engg Parts Account ing Custom ers PriceList PriceList PriceList Comparison of Distributed DB Design strategies  Reading Assignment – pg 505 Distributed DBMS Products(1)  IBM Products  DPropR (DataPropagator Relational)     Distributed Relational Database Architecture (DRDA)   Works with DB2 Primary site and asynchronous updates Read-only sites subscribe to primary sites Heterogeneous databases DataJoiner  Middleware to access non-IBM databases Distributed DBMS Products(2)  Sybase products  Replication Server      SQL Anywhere   Primary site and distributed read-only sites Update to read-only site as one transaction Hierarchical replication Data and stored procedures replicated Mobile databases OmniSQL  Heterogeneous databases Distributed DBMS Products(3)  Oracle products  Table Snapshot Option   Periodic snapshots sent to read-only sites Symmetric Replication Option     Asynchronous and synchronous with multiple updatable copies and replication from any node to any other node (bidirectional) Differential refresh DBA controls replication Two-phase commit Distributed DBMS Products(4)  Microsoft products  SQL Server     Primary site and distributed read-only sites Publish and subscribe, with articles and publications One database can pass copies of publications to other sites Mobile databases Distributed DBMS Products(5)  Computer Associates products  CA-Ingres/Replicator     All database copies updatable Hierarchical representation DBA registers data for replication and other sites subscribe Ingres/Net and Ingres/Star    Decomposes query to distributed , homogeneous sites Two-phase commit Also used with non-ingres databases C.J. Date’s Twelve Commandments   1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. C. J. Date’s Commandments describe a fully distributed database. Although no current DDBMS conforms to all of them, the rules do constitute a useful distributed database target. The rules are: Local site Independence Central site Independence Failure Independence Location Transparency Fragmentation Transparency Replication Transparency Distributed Query Processing Distributed Transaction Processing Hardware Independence Operating System Independence Network Independence Database Independence

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download View File