Download Federated Databases

Federated Databases Coursework CO42009: Object Database Systems Paul Murray, Claire McQuade, Kashif Rafiq, David Miller May 2002 Introduction .................................................................................................................... 3 Aims ........................................................................................................................... 4 Characteristics ............................................................................................................ 5 Autonomy .............................................................................................................. 5 Distribution ............................................................................................................ 5 Heterogeneity ......................................................................................................... 5 Characteristics of a FDBS ...................................................................................... 6 Architecture.................................................................................................................... 7 Processors .................................................................................................................. 7 Transforming Processor ......................................................................................... 7 Filtering Processor ................................................................................................. 7 Filtering Processor ................................................................................................. 8 Construction Processor .......................................................................................... 8 Accessing Processor............................................................................................... 9 Schema ......................................................................................................................... 10 Schema Components ................................................................................................ 12 Local Schema ....................................................................................................... 12 Component Schema ............................................................................................. 12 Export Schema ..................................................................................................... 12 Federated Schema ................................................................................................ 12 External Schema .................................................................................................. 13 Customisation ...................................................................................................... 13 Integrity Constraints............................................................................................. 13 Access Control ..................................................................................................... 13 Seven Layer Model .............................................................................................. 13 Schema Evolution ................................................................................................ 13 Coupling ....................................................................................................................... 15 Loosely coupled FDBS ............................................................................................ 15 Tightly Coupled FDBS ............................................................................................ 16 Views ........................................................................................................................... 18 OO Systems ............................................................................................................. 18 Selection ............................................................................................................... 18 Projection ............................................................................................................. 19 Join ....................................................................................................................... 19 Relational Database Systems ................................................................................... 19 Conclusion ................................................................................................................... 20 References .................................................................................................................... 21 CO42009 – Coursework Federated Databases Page 2 of 21 Introduction Given today’s heterogeneous environments and the need to share information between many different entities, a new area of database technology is being developed. Database Federation is concerned with the integration of two or more database systems to form a single view for an application or user. Given that a Federation is a loosely coupled network of independent entities, each local database administrator maintains control over his own system, and shares the data and design with the rest of the federation. This local knowledge is then aggregated into a large schema or view that is used by applications or the end user. A local database can also take part in one or more federations. The local database can also fulfil its local operations while it is part of the federation. The integration part of the DMS may be managed by the users of the federation, or by the local database administrators. The amount of integration depends on the needs of the federation. Given the ability to export some or their entire schema, any kind of database could be part of a federation. Relational databases are prime candidates for inclusion in a federation. Their design has been around for many years and is used widely across many areas to store regular, ordered data. Almost all relational databases use SQL as their data manipulation language. Again this language is mature and is seen as computationally complete. Another species of database that is beginning to emerge is the object database. Although Object Orientated Databases are still in their infancy, they have several advantages over the relation model, in addition to some weaknesses. In an object system classes are used to provide the means to store and process data. Each class can be thought of as a table, and each instance of a class a record. Built into the class are attributes (for storing values) and methods (for manipulating data). By accessing the methods defined in the class the database effectively becomes the application. Objects also benefit from object orientation, as methods can be overloaded and redefined using inheritance and specialisation. By using these object-orientated features, more complex real world object can be represented using an object system. These are the advantages of an object-orientated database, but there are some drawbacks. Object Orientated databases are young and immature. They are at the same stage that relational databases were at 10 years ago. There are still no full query languages (such as SQL for relational) and many vendors have stopped at a half way house by adding some object features to relation databases (object relational). Whilst the concepts of object databases is hard to understand, their usefulness in modelling real world problems sets them apart from relational databases. Many object systems hold data about biological, or other scientific data. Academic research has become a global effort with institutions cooperating in a way never seen before. If each institution has some data, why not create a federation to share it? The federation might contain relational databases and or object orientated systems. CO42009 – Coursework Federated Databases Page 3 of 21 Aims This report will concentrate on several key areas       Characteristics Architecture Processors Schema Coupling Views As a federation implies one or more databases, the data must be distributed over each entity in the federation. Some entities may have more data than others. In relation systems, the data could be partitioned horizontally or vertically. In an object system, the classes can also be fragmented. Horizontal fragmentation will group by extent, while vertical fragmentation will group by instances. Views are very important to Federated Database Systems (FDBS). They allow data to be selected and projected from each entity to another layer in the schema where it is combined into a larger view. All the application and users that use the federation will connect to this layer of the schema to gather data. Schema describes the design of the whole system. There will be schema that describes the local database (relational or object), any middleware that projects the local database to the federation will have to reference schema for data types. Often there is a global schema that describes the overall federation as well. CO42009 – Coursework Federated Databases Page 4 of 21 Characteristics Autonomy As each member of the Federation is a local Database Management System (DBMS), their characteristics are fundamental to the systems that are built on top of them. A local DBMS will typically host several databases on one machine. These databases could hold anything from accounting information to taxonomy data. The DBMS is the software that manages the database. It provides such features as    Transaction Control Access Control Query Processing Each local DBMS has autonomy. One database administrator will control the day to day running of the DBMS. However in a Federation there has to be some distribution of control that defines how independently the local DBMS can operate. This will govern features like design, communication and execution in the Federation. Distribution Due to the nature of a federated database, data will be spread over multiple databases. These databases may be on the same computer, or on geographically distant systems. The only requirement of a FDS is that the systems can talk to each other over a network. There are several advantages to data distribution. These include improved access times, availability and reliability. In a distributed database system the data may deliberately be distributed, but in a federated system data is generally distributed already and is governed by what data each local node already has, although this can be altered. Heterogeneity It will be likely that in a Federation there will be more than one type of Operating System and database combination, there must be some kind of interoperability in the system. Each local DBMS will have different ways of representing data models, semantics and system operations. When looking at data models such things as structures and query languages must be taken into account. Every programming language represents structures differently and query languages will be different for both relational systems and also object systems. There are even differences between SQL implementations on the same operating system (T-SQL – Microsoft and PSQL – Oracle). Semantics describe what something is. If two systems that are to be in a federation refer to a customer in different ways (one might use a customer number, the other a shortened version of the customer name), then some problems might be encountered. Although they both refer to a customer and both uniquely identify the customer, they CO42009 – Coursework Federated Databases Page 5 of 21 are in fact two different attributes of a customer. If there are difficulties in the interpretation of attribute names or other vital information then there will be problems. Using different operating systems and DBMS will also present problems. Every database system will handle core functionality differently. If grouping DBMS into one logical group, basic tasks like transaction handling and recovery could be tricky. In addition not all features might be available to the database, depending on the operating system it sits on top of. Characteristics of a FDBS The FDBS is a collection of cooperating but autonomous database management systems. Each of the DBMS participates to a varying degree in the Federation. The FDMS provides the software the gives application access to the underlying DBMS in a secure, controlled manner. There is no centralized control of a FDMS and all operations involve either the local DBMS or the whole federation. The following diagram shows how different types of local databases can be grouped together in a federation. Federated Database System System 1 System 2 System N DBMS (centralised) DBMS (distributed) DBMS (FDBS) CO42009 – Coursework Federated Databases Page 6 of 21 Architecture The FDBS is made up of many components like     Data Local DBMS Processors Schemas The first two points are self-explanatory. A database is useless without data, and there must be some software available to manage requests to the database and perform tasks like transaction handling. The processors are application independent programs that will perform tasks like data conversion between systems. Schemas are the designs of each DBMS and the FDBS. These contain details of each structure in a database, all its attributes and error checking. Processors Processors are perhaps the most important extension to the local DBMS. These programs allow such functions as data transformation, data filtering, data construction and data access. Example diagrams are shown after each processor description. Transforming Processor The transforming processor works to transform data from one language to another, translate data from one format to another and provides some kind of data independence as it helps hide differences in query languages and data formats. SCHEMA A TRANSFORMING PROCESSOR   Command Translation Schema Translation SCHEMA B CO42009 – Coursework Federated Databases Page 7 of 21 Filtering Processor Typically the filtering processor will constrain the commands and data that can be passed from one processor to another. The syntax and semantics of each command and data is checked for validity before being passed on. This reduces the risk of data corruption if data is passed unmodified to another process. If data or commands require to be modified, they are passed to a transforming processor. SUBSET OF SCHEMA A   FILTERING PROCESSOR Control Commands Control Data SCHEMA A Construction Processor The construction processors enable the aggregation of data from many sources into one. They support location, distribution and replication operations. Construction processors also provide schema integration between local DBMS. This entails negotiation, command optimisation and decomposition and transactional management over a FDBS. SCHEMA A CONSTRUCTING PROCESSOR SCHEMA B CO42009 – Coursework Federated Databases SCHEMA C Page 8 of 21 Accessing Processor The accessing processors accept commands from other processors, execute them against a local DBMS and return data to the other processors. COMMANDS ACCESSING PROCESSOR DATA DATABASE CO42009 – Coursework Federated Databases Page 9 of 21 Schema While the three level schema architecture is adequate for a centralised DBMS it is not really an adequate architecture for FDBS. To support the system distribution, heterogeneity and autonomy associated with FDBS it is necessary to extend the three level schemas. There are various levels of extension used such as the four level, five or even seven level schema architectures, however the happiest medium seems to be the five level schemas. The following diagram Fig. S1 shows the layout of the five level schemas. External Schema External Schema Federated Schema Export Schema Fig. S1 External Schema Federated Schema Export Schema Export Schema Component Schema Component Schema Local Schema Local Schema Component DBS Component DBS CO42009 – Coursework Federated Databases Page 10 of 21 The next diagram Fig. S2 is of a system architecture consisting of processor and schema of an FDBS. External Schema External Schema Filtering Processor Federated Schema Filtering Processor Federated Schema Constructing Processor Export Schema Constructing Processor Export Schema Filtering Processor Component Schema Filtering Processor Component Schema External Schema Filtering Processor Federated Schema Constructing Processor Export Schema Filtering Processor Component Schema Transforming Processor Transforming Processor Transforming Processor Local Schema Local Schema Local Schema Component DBS Component DBS Component DBS Fig. S2 CO42009 – Coursework Federated Databases Page 11 of 21 Schema Components Local Schema The local schema is the conceptual schema of a component DBS. It is expressed in the native data model of the component DBMS and so different local schemas can be expressed in different data models. Component Schema A component schema is made by translating the local schemas into a data model called the canonical or common data model (CDM) of the FDBS. There are two reasons for defining the component schemas in a common data model. The first reason is each design feature will now be described in a common way over all component schemas. The second reason is that semantics that are missing in local schema can be added to its component schema. When developing a tightly coupled FDBS this facilitates negotiation and integration tasks performed during development. With loosely coupled FDBS they facilitate negotiation and the specification of views and multi-database queries. Export Schema The export schema represents a subset of a component schema that is available to the FDBS. It may include access control information regarding its use by other federation users, not all of the data components may be available to the federation. A filtering processor can be used to provide the access control as specified in an export schema by limiting the allowable operations that can be submitted on the corresponding component schema. The export schema and such filtering processes support the autonomy of an FDBS. Federated Schema A federated schema is an integration of multiple export schemas. A federated schema includes the data distribution information that is generated when integrating export schemas, although some systems use a separate schema called a distribution or allocation schema to store this information. A constructing processor transforms commands on the federated schema into commands on one or more export schemas. The constructing processors and federated schemas support the distribution feature of the FDBS. There can be multiple federated schemas in an FDBS, one for each class of federation users. A class of federation users is a group of users and/or applications performing a related set of activities. A simple example might be represented in a corporate type environment, managers may be one class of federation users and all employees and applications associated with the accounting department may be another class. Similar concepts to the federated schema exist and are represented by terms such as import schema, global schema, global conceptual schema, unified schema, and CO42009 – Coursework Federated Databases Page 12 of 21 enterprise schema. It should be noted however that such terms other than import schemas tend to be used only when there is only one of this type of schema in the system. External Schema An external schema defines a schema for a user and/or application or class of users/applications. There are several reasons for using external schemas and are as follow. Customisation Federated schema are sometimes difficult to change, an external schema can be used to specify a subset of information in a federated schema that is relevant to users of the external schema. This allows changes to be made more readily to meet users’ needs than would be normally possible with a federated schema. The data model for the external schema may be different than that of the federated schema. Integrity Constraints The external schema allows you to put in place additional integrity constraints on the system. Access Control The export schema provides access control with respect to the data managed by the component databases; they also provide access control with respect to the data managed by the FDBS. A filtering process analyses the commands on an external schema to make sure that they conform to the access control and integrity constraints of the federated schema. A transform process is used if the export schema has a different data model from the federated schema to transform the commands on the external schema to those on the federated schema Seven Layer Model While the most popular design model for database federations is the five layer model there have been some extensions to this. For some recent healthcare projects two extra layers were added to the design. This was due to the extra levels of security that were required. One of the layers dealt with schema authorisation. These schemas were subsets of federated schema, but only contained subsets of data. Schema Evolution One of the problems found with federated databases is the cumulative schema evolution rate. A client application cannot be tied to a specific integrated view of the federation, as all it takes is one member of the federation to change its schema without ensuring backward compatibility to break the client. One way for clients to retain schema independence is by requesting a copy of the schema from an object broker CO42009 – Coursework Federated Databases Page 13 of 21 (OB) when necessary. If this is done the back-end schema can change without affecting the operation of the front-end views. To support this dynamic schema exchange, the data sent between the client and OB cannot be compile-time defined. To support it a suite of generic object structures for building objects on the fly is needed at both ends. These would include abstractions for primitive types, extensible arrays, and extensible associative arrays. The schema data is also captured using these structures and validation tools are required that can check an instance data structure for schema compliance. A common OB system is CORBA. It is maintained by the same organisation as the ODMG standards. CO42009 – Coursework Federated Databases Page 14 of 21 Coupling Coupling describes the amount of autonomy that each local system has within a federation. There are two types of coupling, loose and tightly coupled. Each defines how much control the local administrator has over the schema that is exported from their database. Loosely coupled FDBS A loosely coupled system provides an interface to deal with multiple component DBMS directly. A loosely coupled approach may be better suited for integrating a large number of very autonomous read only databases. The User has to manage the federated schemas themselves, thus the FDBS can do very little to optimise queries, but the user is free to specify their own queries to achieve good performance. This implies that the user must have a good understanding of the components DBMS. A user of a loosely coupled FDBS has to be sophisticated to be able to find appropriate export schemas that can provide the required data and to define mappings between their federated schemas and export schemas. Lack of adequate semantics in the component schemas can make this task particularly difficult. This architecture has the following advantages:    A user has more control of relationships and mappings among the objects in the export schema, thus they can precisely specify them. This is desirable when the federation DBA is unable to specify the mappings in order to integrate data in multiple databases in a manner that will be meaningful to the user’s criteria. Supporting multiple semantics becomes a possibility since different users can import or integrate export schemas differently and maintain different mappings from their federated schemas to the export schemas. Users can design schema to reflect their own needs. This can be a significant advantage when the federation DBA cannot anticipate the needs of the federation users. This architecture has the following disadvantages:    The loosely coupled approach is not well suited for the more traditional business/corporate databases, mainly because of its lack of security control that it provides. The users need to be skilled to be able to use it, this is not very appropriate as the users are normally naïve and would find it difficult to perform negations and investigations themselves, or where the location, distribution and replication transparencies are desirable in a business. Its not very suitable for updates as the FDBS may degrade the data integrity, as when the user of a loosely coupled FDBS creates a federated schema using a view definition process, view update transformations are often not determined. The users may not have complete information on the component DBMS and different users may use different semantic interpretations of the CO42009 – Coursework Federated Databases Page 15 of 21 data managed by the component DBMS. Thus different users can define different federated schemas over the same component DBMS, and different transformations can be chosen for the same updates submitted on different federated schemas. Diagram showing the main architectural differences between loosely coupled and tightly coupled federations. Tightly Coupled FDBS A tightly coupled federation requires the construction of a global schema via which queries can be posed. All semantic heterogeneities among the databases that are used are resolved at the global schema level and are hidden from users querying this schema. This architecture has the following disadvantages:      Provides location, replication, and distribution transparency. This is achieved by developing a federated schema that integrates multiple export schemas. All export schemas are integrated to develop a single federated schema (also called enterprise schema or global conceptual schema) to have a single point of control for all data sharing in the organization across the component DBMS. Allows the tailoring of the use of the FDBS with respect to multiple classes of federation users with different data access requirements. This architecture can support multiple semantics. Updates are easier to support in tightly coupled FDBS where DBAs carefully define mappings than in a loosely coupled FDBS where the users define the mappings. Now the transparencies are managed by mappings between the federated schema and the export schemas, and the export schemas, and a federation user can query using a classic query language (like SQL) against the federated schema with an illusion that he or she is accessing a single system. Using a single federated schema helps in CO42009 – Coursework Federated Databases Page 16 of 21 defining uniform semantics of the data in the FDBS. With a single federated schema, it is also easier to enforce constraints that cross export schemas (and hence multiple databases) then when multiple federated schemas are allowed. Similar problems can occur in a tightly coupled FDBS with the multiple federations as mentioned with the loosely coupled FDBS but can be resolved at the time of federated schema by creation through schema integration. A federation DBA creating a federated schema using a schema integration process can be expected to have more complete knowledge of the component DBMS and other federated schemas. This architecture has the following disadvantages:    It can become too large and difficult to create and maintain. When one federated schema is created it will become large and complicated, as it has to meet the requirements of all users. May become necessary to support external schemas for different federation users due to large data requirements. When an FDBS allows updates, multiple semantics, which could lead to inconsistencies. For this reason, federation DBAs have to be very careful in developing the federated schemas and their mappings to the export schemas. CO42009 – Coursework Federated Databases Page 17 of 21 Views Given the way that federated databases work with autonomy and data abstraction using schema, it is important to separate the data from the schema. A change in the local schema of a database should not have a great effect on the operation of the federation. Views are one way of achieving this. They allow us to build temporary tables as the result of queries and point applications to these tables. A view can be implemented in two ways:   Query re-writing – where a query is re-written in terms of a view. Materialisation – where a table is cached so that if required can be recomputed when needed. Although query rewriting maintains the logic of the query, it is difficult to implement and is relatively inefficient. In contrast, materialisation is easy to implement and is very efficient if the data in the underlying tables (for a relational database) or classes (for object systems) does not change too often. If the data does change frequently the view will need to be rebuilt at shorter intervals. As a FDBS can contain different type of databases like relational and object systems, it is important to note the differences between the two types. In an object system, a view is a subschema of virtual classes, where as in a relation system it is the result of a query. Described below are some of the features of each kind of database system when implementing views. OO Systems An integration service imports views and combines them to form a Federated view, where specific views are selected to comprise each federated schema. There are 4 different types of views:     Selection Projection Join Real Selection Within a selection view classes in the schema are not modified in any way but their population is filtered. CO42009 – Coursework Federated Databases Page 18 of 21 Projection Within projection views the instances of the class are presented as if a different schema has described them, with their properties being either added or removed. Join Within join views different instances are combined. There is also a uni-join view (fragmentation). Due to the evolving nature of the object database world, views have not been totally incorporated into any standards. The ODMG standard has no reference to views. This is because the programming language orientation of object systems does not provide all the functionality to describe and create views. In addition there is no clear foundation for view definition as there is with relational systems. Relational Database Systems Within Relational Systems a view is a virtual table whose contents are defined by a query. Like a real table, a view consists of a set of named columns and rows of data. However, a view does not exist as a stored set of data values in a database. The rows and columns of data come from tables referenced in the query defining the view and are produced dynamically when the view is referenced. A view acts as a filter on the underlying tables referenced in the view. The query that defines the view can be from one or more tables or from other views in the current or other databases. Distributed queries can also be used to define views that use data from multiple heterogeneous sources. This is useful, if you want to combine similarly structured data from different servers each of which stores data for a different region of the organisation. A view is used to do any or all of these functions:  Restrict a user to specific rows in a table. For example, allow an employee to see only the rows recording his or her work in a labour-tracking table.  Restrict a user to specific columns. For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information.  Join columns from multiple tables so that they look like a single table.  Aggregate information instead of supplying details. For example, present the sum of a column, or the maximum or minimum value from a column. CO42009 – Coursework Federated Databases Page 19 of 21 Conclusion The use of database federation has become increasingly viable due to the explosive growth in the Internet and the need to share data. Unfortunately since data is distributed, isolated and sometimes partially redundant this makes it difficult for users to query the data. Given that a Federation is a loosely coupled network of independent entities, each local database administrator maintains control over the system and then shares the data and design with the rest of the federation. This knowledge is then aggregated into a large schema or view that is used by either applications or the end user. Due to the need foe heterogeneity there are still some problems in properly implementing a federation, due to the relative immaturity of some database types. OODB is where relational was 10 years ago. However more research needs to be done into the problems that prevent wide scale deployment of the technology, such as views and query languages. Since government organisations, academic institutions, and business entities create and maintain extensive databases containing all kinds of information ranging from natural-language text documents, statistical tables, financial data, and multimedia objects to data of a scientific and technical nature, its not surprising that businesses and organisations want data to appear as a coherent whole rather than as ‘islands of data’. It is much more beneficial for data with a high functionality to be accessed in a uniform way and that it is supported by existing development tools, environments and people. CO42009 – Coursework Federated Databases Page 20 of 21 References       Towards Efficient and Scalable Mediation the AURORA Approach - Ling Yan, Laboratory for Database Systems Research University Alberta, Canada Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Data - A. P. Sheth, J. A. Larson. An Approach to Resolving Semantic Heterogeneity in a Federation of Autonomous and Heterogeneous Database Systems – Joachim Hammer and Denis McLeod Myriad Design and Implementation of a Federated Database Prototype – Ed Peng Lim, San Yih Hwang, Jaideep Srivastavay, Dave Clements, M Ganesh, Department of Computer Science, University of Minnesota. Database Architecture: Federated vs. Clustered – Oracle Computer Corporation Federated Database Systems for Managing Heterogeneous and Autonomous databases - Ceri and Pelagatti 1984 CO42009 – Coursework Federated Databases Page 21 of 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Federated Databases