Download Federated Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Federated Databases
Coursework
CO42009: Object Database Systems
Paul Murray, Claire McQuade, Kashif Rafiq, David
Miller
May 2002
Introduction .................................................................................................................... 3
Aims ........................................................................................................................... 4
Characteristics ............................................................................................................ 5
Autonomy .............................................................................................................. 5
Distribution ............................................................................................................ 5
Heterogeneity ......................................................................................................... 5
Characteristics of a FDBS ...................................................................................... 6
Architecture.................................................................................................................... 7
Processors .................................................................................................................. 7
Transforming Processor ......................................................................................... 7
Filtering Processor ................................................................................................. 7
Filtering Processor ................................................................................................. 8
Construction Processor .......................................................................................... 8
Accessing Processor............................................................................................... 9
Schema ......................................................................................................................... 10
Schema Components ................................................................................................ 12
Local Schema ....................................................................................................... 12
Component Schema ............................................................................................. 12
Export Schema ..................................................................................................... 12
Federated Schema ................................................................................................ 12
External Schema .................................................................................................. 13
Customisation ...................................................................................................... 13
Integrity Constraints............................................................................................. 13
Access Control ..................................................................................................... 13
Seven Layer Model .............................................................................................. 13
Schema Evolution ................................................................................................ 13
Coupling ....................................................................................................................... 15
Loosely coupled FDBS ............................................................................................ 15
Tightly Coupled FDBS ............................................................................................ 16
Views ........................................................................................................................... 18
OO Systems ............................................................................................................. 18
Selection ............................................................................................................... 18
Projection ............................................................................................................. 19
Join ....................................................................................................................... 19
Relational Database Systems ................................................................................... 19
Conclusion ................................................................................................................... 20
References .................................................................................................................... 21
CO42009 – Coursework Federated Databases
Page 2 of 21
Introduction
Given today’s heterogeneous environments and the need to share information between
many different entities, a new area of database technology is being developed.
Database Federation is concerned with the integration of two or more database
systems to form a single view for an application or user.
Given that a Federation is a loosely coupled network of independent entities, each
local database administrator maintains control over his own system, and shares the
data and design with the rest of the federation. This local knowledge is then
aggregated into a large schema or view that is used by applications or the end user. A
local database can also take part in one or more federations. The local database can
also fulfil its local operations while it is part of the federation.
The integration part of the DMS may be managed by the users of the federation, or by
the local database administrators. The amount of integration depends on the needs of
the federation.
Given the ability to export some or their entire schema, any kind of database could be
part of a federation. Relational databases are prime candidates for inclusion in a
federation. Their design has been around for many years and is used widely across
many areas to store regular, ordered data. Almost all relational databases use SQL as
their data manipulation language. Again this language is mature and is seen as
computationally complete.
Another species of database that is beginning to emerge is the object database.
Although Object Orientated Databases are still in their infancy, they have several
advantages over the relation model, in addition to some weaknesses. In an object
system classes are used to provide the means to store and process data. Each class can
be thought of as a table, and each instance of a class a record. Built into the class are
attributes (for storing values) and methods (for manipulating data). By accessing the
methods defined in the class the database effectively becomes the application.
Objects also benefit from object orientation, as methods can be overloaded and
redefined using inheritance and specialisation.
By using these object-orientated features, more complex real world object can be
represented using an object system. These are the advantages of an object-orientated
database, but there are some drawbacks.
Object Orientated databases are young and immature. They are at the same stage that
relational databases were at 10 years ago. There are still no full query languages (such
as SQL for relational) and many vendors have stopped at a half way house by adding
some object features to relation databases (object relational).
Whilst the concepts of object databases is hard to understand, their usefulness in
modelling real world problems sets them apart from relational databases.
Many object systems hold data about biological, or other scientific data. Academic
research has become a global effort with institutions cooperating in a way never seen
before. If each institution has some data, why not create a federation to share it? The
federation might contain relational databases and or object orientated systems.
CO42009 – Coursework Federated Databases
Page 3 of 21
Aims
This report will concentrate on several key areas






Characteristics
Architecture
Processors
Schema
Coupling
Views
As a federation implies one or more databases, the data must be distributed over each
entity in the federation. Some entities may have more data than others. In relation
systems, the data could be partitioned horizontally or vertically. In an object system,
the classes can also be fragmented. Horizontal fragmentation will group by extent,
while vertical fragmentation will group by instances.
Views are very important to Federated Database Systems (FDBS). They allow data to
be selected and projected from each entity to another layer in the schema where it is
combined into a larger view. All the application and users that use the federation will
connect to this layer of the schema to gather data.
Schema describes the design of the whole system. There will be schema that describes
the local database (relational or object), any middleware that projects the local
database to the federation will have to reference schema for data types. Often there is
a global schema that describes the overall federation as well.
CO42009 – Coursework Federated Databases
Page 4 of 21
Characteristics
Autonomy
As each member of the Federation is a local Database Management System (DBMS),
their characteristics are fundamental to the systems that are built on top of them. A
local DBMS will typically host several databases on one machine. These databases
could hold anything from accounting information to taxonomy data. The DBMS is the
software that manages the database. It provides such features as



Transaction Control
Access Control
Query Processing
Each local DBMS has autonomy. One database administrator will control the day to
day running of the DBMS. However in a Federation there has to be some distribution
of control that defines how independently the local DBMS can operate. This will
govern features like design, communication and execution in the Federation.
Distribution
Due to the nature of a federated database, data will be spread over multiple databases.
These databases may be on the same computer, or on geographically distant systems.
The only requirement of a FDS is that the systems can talk to each other over a
network.
There are several advantages to data distribution. These include improved access
times, availability and reliability. In a distributed database system the data may
deliberately be distributed, but in a federated system data is generally distributed
already and is governed by what data each local node already has, although this can
be altered.
Heterogeneity
It will be likely that in a Federation there will be more than one type of Operating
System and database combination, there must be some kind of interoperability in the
system. Each local DBMS will have different ways of representing data models,
semantics and system operations.
When looking at data models such things as structures and query languages must be
taken into account. Every programming language represents structures differently and
query languages will be different for both relational systems and also object systems.
There are even differences between SQL implementations on the same operating
system (T-SQL – Microsoft and PSQL – Oracle).
Semantics describe what something is. If two systems that are to be in a federation
refer to a customer in different ways (one might use a customer number, the other a
shortened version of the customer name), then some problems might be encountered.
Although they both refer to a customer and both uniquely identify the customer, they
CO42009 – Coursework Federated Databases
Page 5 of 21
are in fact two different attributes of a customer. If there are difficulties in the
interpretation of attribute names or other vital information then there will be
problems.
Using different operating systems and DBMS will also present problems. Every
database system will handle core functionality differently. If grouping DBMS into one
logical group, basic tasks like transaction handling and recovery could be tricky. In
addition not all features might be available to the database, depending on the
operating system it sits on top of.
Characteristics of a FDBS
The FDBS is a collection of cooperating but autonomous database management
systems. Each of the DBMS participates to a varying degree in the Federation. The
FDMS provides the software the gives application access to the underlying DBMS in
a secure, controlled manner. There is no centralized control of a FDMS and all
operations involve either the local DBMS or the whole federation.
The following diagram shows how different types of local databases can be grouped
together in a federation.
Federated Database System
System 1
System 2
System N
DBMS (centralised)
DBMS (distributed)
DBMS (FDBS)
CO42009 – Coursework Federated Databases
Page 6 of 21
Architecture
The FDBS is made up of many components like




Data
Local DBMS
Processors
Schemas
The first two points are self-explanatory. A database is useless without data, and there
must be some software available to manage requests to the database and perform tasks
like transaction handling.
The processors are application independent programs that will perform tasks like data
conversion between systems.
Schemas are the designs of each DBMS and the FDBS. These contain details of each
structure in a database, all its attributes and error checking.
Processors
Processors are perhaps the most important extension to the local DBMS. These
programs allow such functions as data transformation, data filtering, data construction
and data access. Example diagrams are shown after each processor description.
Transforming Processor
The transforming processor works to transform data from one language to another,
translate data from one format to another and provides some kind of data
independence as it helps hide differences in query languages and data formats.
SCHEMA A
TRANSFORMING PROCESSOR


Command Translation
Schema Translation
SCHEMA B
CO42009 – Coursework Federated Databases
Page 7 of 21
Filtering Processor
Typically the filtering processor will constrain the commands and data that can be
passed from one processor to another. The syntax and semantics of each command
and data is checked for validity before being passed on. This reduces the risk of data
corruption if data is passed unmodified to another process. If data or commands
require to be modified, they are passed to a transforming processor.
SUBSET OF SCHEMA A


FILTERING PROCESSOR
Control Commands
Control Data
SCHEMA A
Construction Processor
The construction processors enable the aggregation of data from many sources into
one. They support location, distribution and replication operations. Construction
processors also provide schema integration between local DBMS. This entails
negotiation, command optimisation and decomposition and transactional management
over a FDBS.
SCHEMA A
CONSTRUCTING PROCESSOR
SCHEMA B
CO42009 – Coursework Federated Databases
SCHEMA C
Page 8 of 21
Accessing Processor
The accessing processors accept commands from other processors, execute them
against a local DBMS and return data to the other processors.
COMMANDS
ACCESSING PROCESSOR
DATA
DATABASE
CO42009 – Coursework Federated Databases
Page 9 of 21
Schema
While the three level schema architecture is adequate for a centralised DBMS it is not
really an adequate architecture for FDBS. To support the system distribution,
heterogeneity and autonomy associated with FDBS it is necessary to extend the three
level schemas. There are various levels of extension used such as the four level, five
or even seven level schema architectures, however the happiest medium seems to be
the five level schemas.
The following diagram Fig. S1 shows the layout of the five level schemas.
External
Schema
External
Schema
Federated
Schema
Export
Schema
Fig. S1
External
Schema
Federated
Schema
Export
Schema
Export
Schema
Component
Schema
Component
Schema
Local
Schema
Local
Schema
Component
DBS
Component
DBS
CO42009 – Coursework Federated Databases
Page 10 of 21
The next diagram Fig. S2 is of a system architecture consisting of processor and
schema of an FDBS.
External Schema
External Schema
Filtering
Processor
Federated Schema
Filtering
Processor
Federated Schema
Constructing
Processor
Export Schema
Constructing
Processor
Export Schema
Filtering
Processor
Component Schema
Filtering
Processor
Component Schema
External Schema
Filtering
Processor
Federated Schema
Constructing
Processor
Export Schema
Filtering
Processor
Component Schema
Transforming
Processor
Transforming
Processor
Transforming
Processor
Local Schema
Local Schema
Local Schema
Component DBS
Component DBS
Component DBS
Fig. S2
CO42009 – Coursework Federated Databases
Page 11 of 21
Schema Components
Local Schema
The local schema is the conceptual schema of a component DBS. It is expressed in
the native data model of the component DBMS and so different local schemas can be
expressed in different data models.
Component Schema
A component schema is made by translating the local schemas into a data model
called the canonical or common data model (CDM) of the FDBS. There are two
reasons for defining the component schemas in a common data model. The first
reason is each design feature will now be described in a common way over all
component schemas. The second reason is that semantics that are missing in local
schema can be added to its component schema. When developing a tightly coupled
FDBS this facilitates negotiation and integration tasks performed during development.
With loosely coupled FDBS they facilitate negotiation and the specification of views
and multi-database queries.
Export Schema
The export schema represents a subset of a component schema that is available to the
FDBS. It may include access control information regarding its use by other federation
users, not all of the data components may be available to the federation. A filtering
processor can be used to provide the access control as specified in an export schema
by limiting the allowable operations that can be submitted on the corresponding
component schema. The export schema and such filtering processes support the
autonomy of an FDBS.
Federated Schema
A federated schema is an integration of multiple export schemas. A federated schema
includes the data distribution information that is generated when integrating export
schemas, although some systems use a separate schema called a distribution or
allocation schema to store this information. A constructing processor transforms
commands on the federated schema into commands on one or more export schemas.
The constructing processors and federated schemas support the distribution feature of
the FDBS. There can be multiple federated schemas in an FDBS, one for each class
of federation users. A class of federation users is a group of users and/or applications
performing a related set of activities. A simple example might be represented in a
corporate type environment, managers may be one class of federation users and all
employees and applications associated with the accounting department may be
another class.
Similar concepts to the federated schema exist and are represented by terms such as
import schema, global schema, global conceptual schema, unified schema, and
CO42009 – Coursework Federated Databases
Page 12 of 21
enterprise schema. It should be noted however that such terms other than import
schemas tend to be used only when there is only one of this type of schema in the
system.
External Schema
An external schema defines a schema for a user and/or application or class of
users/applications. There are several reasons for using external schemas and are as
follow.
Customisation
Federated schema are sometimes difficult to change, an external schema can be used
to specify a subset of information in a federated schema that is relevant to users of the
external schema. This allows changes to be made more readily to meet users’ needs
than would be normally possible with a federated schema. The data model for the
external schema may be different than that of the federated schema.
Integrity Constraints
The external schema allows you to put in place additional integrity constraints on the
system.
Access Control
The export schema provides access control with respect to the data managed by the
component databases; they also provide access control with respect to the data
managed by the FDBS. A filtering process analyses the commands on an external
schema to make sure that they conform to the access control and integrity constraints
of the federated schema. A transform process is used if the export schema has a
different data model from the federated schema to transform the commands on the
external schema to those on the federated schema
Seven Layer Model
While the most popular design model for database federations is the five layer model
there have been some extensions to this. For some recent healthcare projects two extra
layers were added to the design. This was due to the extra levels of security that were
required. One of the layers dealt with schema authorisation. These schemas were
subsets of federated schema, but only contained subsets of data.
Schema Evolution
One of the problems found with federated databases is the cumulative schema
evolution rate. A client application cannot be tied to a specific integrated view of the
federation, as all it takes is one member of the federation to change its schema without
ensuring backward compatibility to break the client. One way for clients to retain
schema independence is by requesting a copy of the schema from an object broker
CO42009 – Coursework Federated Databases
Page 13 of 21
(OB) when necessary. If this is done the back-end schema can change without
affecting the operation of the front-end views.
To support this dynamic schema exchange, the data sent between the client and OB
cannot be compile-time defined. To support it a suite of generic object structures for
building objects on the fly is needed at both ends. These would include abstractions
for primitive types, extensible arrays, and extensible associative arrays. The schema
data is also captured using these structures and validation tools are required that can
check an instance data structure for schema compliance.
A common OB system is CORBA. It is maintained by the same organisation as the
ODMG standards.
CO42009 – Coursework Federated Databases
Page 14 of 21
Coupling
Coupling describes the amount of autonomy that each local system has within a
federation. There are two types of coupling, loose and tightly coupled. Each defines
how much control the local administrator has over the schema that is exported from
their database.
Loosely coupled FDBS
A loosely coupled system provides an interface to deal with multiple component
DBMS directly. A loosely coupled approach may be better suited for integrating a
large number of very autonomous read only databases. The User has to manage the
federated schemas themselves, thus the FDBS can do very little to optimise queries,
but the user is free to specify their own queries to achieve good performance. This
implies that the user must have a good understanding of the components DBMS. A
user of a loosely coupled FDBS has to be sophisticated to be able to find appropriate
export schemas that can provide the required data and to define mappings between
their federated schemas and export schemas. Lack of adequate semantics in the
component schemas can make this task particularly difficult.
This architecture has the following advantages:



A user has more control of relationships and mappings among the objects in
the export schema, thus they can precisely specify them. This is desirable
when the federation DBA is unable to specify the mappings in order to
integrate data in multiple databases in a manner that will be meaningful to the
user’s criteria.
Supporting multiple semantics becomes a possibility since different users can
import or integrate export schemas differently and maintain different
mappings from their federated schemas to the export schemas.
Users can design schema to reflect their own needs. This can be a significant
advantage when the federation DBA cannot anticipate the needs of the
federation users.
This architecture has the following disadvantages:



The loosely coupled approach is not well suited for the more traditional
business/corporate databases, mainly because of its lack of security control
that it provides.
The users need to be skilled to be able to use it, this is not very appropriate as
the users are normally naïve and would find it difficult to perform negations
and investigations themselves, or where the location, distribution and
replication transparencies are desirable in a business.
Its not very suitable for updates as the FDBS may degrade the data integrity,
as when the user of a loosely coupled FDBS creates a federated schema using
a view definition process, view update transformations are often not
determined. The users may not have complete information on the component
DBMS and different users may use different semantic interpretations of the
CO42009 – Coursework Federated Databases
Page 15 of 21
data managed by the component DBMS. Thus different users can define
different federated schemas over the same component DBMS, and different
transformations can be chosen for the same updates submitted on different
federated schemas.
Diagram showing the main architectural differences between loosely coupled and
tightly coupled federations.
Tightly Coupled FDBS
A tightly coupled federation requires the construction of a global schema via which
queries can be posed. All semantic heterogeneities among the databases that are used
are resolved at the global schema level and are hidden from users querying this
schema.
This architecture has the following disadvantages:





Provides location, replication, and distribution transparency. This is achieved
by developing a federated schema that integrates multiple export schemas.
All export schemas are integrated to develop a single federated schema (also
called enterprise schema or global conceptual schema) to have a single point
of control for all data sharing in the organization across the component
DBMS.
Allows the tailoring of the use of the FDBS with respect to multiple classes of
federation users with different data access requirements.
This architecture can support multiple semantics.
Updates are easier to support in tightly coupled FDBS where DBAs carefully
define mappings than in a loosely coupled FDBS where the users define the
mappings.
Now the transparencies are managed by mappings between the federated schema and
the export schemas, and the export schemas, and a federation user can query using a
classic query language (like SQL) against the federated schema with an illusion that
he or she is accessing a single system. Using a single federated schema helps in
CO42009 – Coursework Federated Databases
Page 16 of 21
defining uniform semantics of the data in the FDBS. With a single federated schema,
it is also easier to enforce constraints that cross export schemas (and hence multiple
databases) then when multiple federated schemas are allowed.
Similar problems can occur in a tightly coupled FDBS with the multiple federations as
mentioned with the loosely coupled FDBS but can be resolved at the time of federated
schema by creation through schema integration. A federation DBA creating a
federated schema using a schema integration process can be expected to have more
complete knowledge of the component DBMS and other federated schemas.
This architecture has the following disadvantages:



It can become too large and difficult to create and maintain. When one
federated schema is created it will become large and complicated, as it has to
meet the requirements of all users.
May become necessary to support external schemas for different federation
users due to large data requirements.
When an FDBS allows updates, multiple semantics, which could lead to
inconsistencies. For this reason, federation DBAs have to be very careful in
developing the federated schemas and their mappings to the export schemas.
CO42009 – Coursework Federated Databases
Page 17 of 21
Views
Given the way that federated databases work with autonomy and data abstraction
using schema, it is important to separate the data from the schema. A change in the
local schema of a database should not have a great effect on the operation of the
federation.
Views are one way of achieving this. They allow us to build temporary tables as the
result of queries and point applications to these tables.
A view can be implemented in two ways:


Query re-writing – where a query is re-written in terms of a view.
Materialisation – where a table is cached so that if required can be recomputed
when needed.
Although query rewriting maintains the logic of the query, it is difficult to implement
and is relatively inefficient. In contrast, materialisation is easy to implement and is
very efficient if the data in the underlying tables (for a relational database) or classes
(for object systems) does not change too often. If the data does change frequently the
view will need to be rebuilt at shorter intervals.
As a FDBS can contain different type of databases like relational and object systems,
it is important to note the differences between the two types. In an object system, a
view is a subschema of virtual classes, where as in a relation system it is the result of
a query. Described below are some of the features of each kind of database system
when implementing views.
OO Systems
An integration service imports views and combines them to form a Federated view,
where specific views are selected to comprise each federated schema.
There are 4 different types of views:




Selection
Projection
Join
Real
Selection
Within a selection view classes in the schema are not modified in any way but their
population is filtered.
CO42009 – Coursework Federated Databases
Page 18 of 21
Projection
Within projection views the instances of the class are presented as if a different
schema has described them, with their properties being either added or removed.
Join
Within join views different instances are combined. There is also a uni-join view
(fragmentation).
Due to the evolving nature of the object database world, views have not been totally
incorporated into any standards. The ODMG standard has no reference to views. This
is because the programming language orientation of object systems does not provide
all the functionality to describe and create views. In addition there is no clear
foundation for view definition as there is with relational systems.
Relational Database Systems
Within Relational Systems a view is a virtual table whose contents are defined by a
query. Like a real table, a view consists of a set of named columns and rows of data.
However, a view does not exist as a stored set of data values in a database. The rows
and columns of data come from tables referenced in the query defining the view and
are produced dynamically when the view is referenced.
A view acts as a filter on the underlying tables referenced in the view. The query that
defines the view can be from one or more tables or from other views in the current or
other databases.
Distributed queries can also be used to define views that use data from multiple
heterogeneous sources. This is useful, if you want to combine similarly structured
data from different servers each of which stores data for a different region of the
organisation.
A view is used to do any or all of these functions:

Restrict a user to specific rows in a table. For example, allow an employee to see
only the rows recording his or her work in a labour-tracking table.

Restrict a user to specific columns. For example, allow employees who do not
work in payroll to see the name, office, work phone, and department columns in
an employee table, but do not allow them to see any columns with salary
information or personal information.
 Join columns from multiple tables so that they look like a single table.
 Aggregate information instead of supplying details. For example, present the sum
of a column, or the maximum or minimum value from a column.
CO42009 – Coursework Federated Databases
Page 19 of 21
Conclusion
The use of database federation has become increasingly viable due to the explosive
growth in the Internet and the need to share data. Unfortunately since data is
distributed, isolated and sometimes partially redundant this makes it difficult for users
to query the data. Given that a Federation is a loosely coupled network of
independent entities, each local database administrator maintains control over the
system and then shares the data and design with the rest of the federation. This
knowledge is then aggregated into a large schema or view that is used by either
applications or the end user.
Due to the need foe heterogeneity there are still some problems in properly
implementing a federation, due to the relative immaturity of some database types.
OODB is where relational was 10 years ago. However more research needs to be done
into the problems that prevent wide scale deployment of the technology, such as
views and query languages.
Since government organisations, academic institutions, and business entities create
and maintain extensive databases containing all kinds of information ranging from
natural-language text documents, statistical tables, financial data, and multimedia
objects to data of a scientific and technical nature, its not surprising that businesses
and organisations want data to appear as a coherent whole rather than as ‘islands of
data’. It is much more beneficial for data with a high functionality to be accessed in a
uniform way and that it is supported by existing development tools, environments and
people.
CO42009 – Coursework Federated Databases
Page 20 of 21
References






Towards Efficient and Scalable Mediation the AURORA Approach - Ling
Yan, Laboratory for Database Systems Research University Alberta, Canada
Federated Database Systems for Managing Distributed, Heterogeneous, and
Autonomous Data - A. P. Sheth, J. A. Larson.
An Approach to Resolving Semantic Heterogeneity in a Federation of
Autonomous and Heterogeneous Database Systems – Joachim Hammer and
Denis McLeod
Myriad Design and Implementation of a Federated Database Prototype – Ed
Peng Lim, San Yih Hwang, Jaideep Srivastavay, Dave Clements, M Ganesh,
Department of Computer Science, University of Minnesota.
Database Architecture: Federated vs. Clustered – Oracle Computer
Corporation
Federated Database Systems for Managing Heterogeneous and Autonomous
databases - Ceri and Pelagatti 1984
CO42009 – Coursework Federated Databases
Page 21 of 21