Download Lecture12 - Distributed Databases

COIS20026 Database Development & Management Week 12 – Distributed Databases Prepared by: Pramila Gupta Updated by: Angelika Schlotzer & Satish Balmuri Week 12 - Distributed Databases Reading  Readings for this week: Study guide module 12  Text book readings as directed by study guide  2 Objectives      Describe what is meant by a distributed database Describe how this differs from a decentralised database List the reasons for and against a distributed database Describe the difference between homogenous and heterogeneous distributed databases Describe location transparency and local autonomy 3 Objectives (cont’d)    Explain horizontal partitioning and vertical partitioning Define local transaction and global transaction List & describe the 4 key objectives of a distributed database:     Location transparency Replication transparency Failure transparency Concurrency transparency 4 Distributed vs decentralised  Distributed database    Appears as one database to the user Users should not normally be aware of the location of any given data Decentralised database:   Does not appear as one database to the user User will have to manually navigate to data at another site – will have to know where it is. 5 Architecture  DBMS runs on multiple sites on a network  normally organisations will use one of the big six DBMS   ORACLE, DB2, Informix, Sybase, Ingres, Microsoft and only use 1 ‘database engine’ DBMS  specialist knowledge (personnel) required to manage/program them 6 Architecture (cont’d)  There will be problems/limitations getting 2 different DBMS to work together (standards are emerging to make this easier)  when all DBMS in a distributed database are the same, we call it a homogeneous system as distinct from a heterogeneous system (refer to figures 13-2 and 13-3)  each DBMS manages a collection of tables (as part of databases) 7 Architecture (cont’d)   These tables are exposed to (can be used by) users (end-users & programs) on other sites the goal is: that users are unaware of the physical location of tables  to a user, a distributed database looks like a local database   distributed database systems are typically only used by large organisations 8 Why use a Distributed Database System   Large organisations are geographically dispersed entities it may make sense to keep data where it is generated & most often used   to reduce data transfer costs/network bandwidth improve access speeds 9 Why use a Distributed Database System (cont’d)  Politics typically plays a part  increased local autonomy is a factor 10 Why not use a Distributed Database System     Expensive to buy even more expensive to manage/maintain specialised knowledge (personnel) is needed to setup, manage & maintain more database personnel are required (to manage the different sites) 11 Principles & Objectives  Fundamental principle of a distributed database:   to the user, the distributed database should look like a local database 12 objectives for a distributed database system:  local autonomy  local DBMS is autonomous 12 Principles & Objectives (cont’d)  Local DBMS can perform its functions independently of other sites  if some other site is down, local DBMS can still function  in practice, local DBMS must cooperate with other DBMS  hence, will be partly dependent on other sites for some services  eg access to a table where the ‘primary copy’ is held on another site 13 Principles & Objectives (cont’d)  So: local autonomy - to the maximum extent possible  no reliance on central site  no site in network should assume special role as ‘central site’  otherwise, system is vulnerable to failure of this site  actually, this is just one aspect of the local autonomy issue 14 Principles & Objectives (cont’d)  Continuous operation  minimise unplanned shutdowns  there should be no need for planned shutdowns (eg to add a new site)  location independence  otherwise known as ‘transparency’  it should be transparent to a user / programmer that some tables are held at a remote site 15 Principles & Objectives (cont’d)  Someone needs to know where they are - the database administrator(s)  by hiding these details from the user / programmer: life is simpler for the user/programmer  applications do not become dependent on the location of the tables (ie no data dependence)  16 Principles & Objectives (cont’d)  Fragmentation independence  fragments:  horizontal  table rows are held in different locations (eg Australasian account records held in Melbourne (M_Account) and European account records held in Paris (P_Account)  users see a single, unified Account table 17 Principles & Objectives (cont’d)  Note: relational systems are well suited to handle this fragmentation;  eg- Account virtual table can be defined in terms of physical tables as: SELECT * FROM A_Account UNION SELECT * FROM E_Account  eg - specification of a fragment where a row is stored 18 Principles & Objectives (cont’d)  eg - specification of a fragment where a row is stored is a restriction - Melbourne rows: WHERE Continent = ‘Australasia’  vertical  not as many applications  may wish to hold columns holding - sensitive data or special data (eg picture, map) on a dedicated server 19 Principles & Objectives (cont’d)  again: relational systems are well suited to handle this fragmentation  virtual table can be defined as a join of physical vertical fragments, and  specification of sensitive columns to hold on dedicated server is a projection 20 Principles & Objectives (cont’d)  Fragmentation should be hidden from users so that applications do not become dependent on a given fragmentation  views will be used to hide sensitive columns from unauthorised users  query processor will fragment queries against a fragmented table 21 Principles & Objectives (cont’d)  Eg SELECT ID FROM Account WHERE CreditRating = ‘AAA’  becomes SELECT ID FROM A_Account WHERE CreditRating = ‘AAA’ UNION SELECT ID FROM E_Account WHERE CreditRating = ‘AAA’ 22 Principles & Objectives (cont’d)  Replication independence replicas: may make sense to replicate commonly used data on multiple sites  should be hidden from users  complications - update:   do all copies of an object need to be locked?  do all copies of an object need to be updated? 23 Principles & Objectives (cont’d)  Distributed query processing distributed queries are potentially very costly, so need for optimisation  distributed query optimisation just an extension of local query optimisation for RDBMS   so, once again, relational systems well-suited to distributed systems 24 Principles & Objectives (cont’d)  Date makes the point that the setoriented relational approach is well suited to distributed databases as a single request (query) can be sent to a site from which data is sought; in a record oriented system, a request must be sent for each record 25 Principles & Objectives (cont’d)  Distributed transaction management this is more of a requirement than an objective  most applications will use transactions to protect data integrity  in a distributed database, there will be a need for distributed transactions   transactions that involve changes to records on multiple sites 26 Principles & Objectives (cont’d)  Hardware independence the idea is that you should be free to choose the hardware on which you implement your distributed database  more an issue of the operating system supported  important for organisations with a mix of hardware/operating systems  27 Principles & Objectives (cont’d)  Products like Oracle are strong here  run on different range of Unix flavours, NT, MVS, (mainframe OS), etc  Microsoft SQL server is at the other end of the spectrum  runs  only on NT Operating System Independence  see above 28 Principles & Objectives (cont’d)  Network Independence    similar sort of thing increasingly, the operating system hides the NOS from DBMS DBMS Independence should be able to mix & match RDBMS  in fact, advanced features like cooperative distributed transaction processing is limited  29 The Difficult Bits  In a distributed database, it becomes much more difficult to manage:     The catalog Query processing Concurrent access Recovery 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture12 - Distributed Databases