Download An Architecture for Homogenizing Federated Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Encyclopedia of World Problems and Human Potential wikipedia , lookup

Microsoft Access wikipedia , lookup

Serializability wikipedia , lookup

Commitment ordering wikipedia , lookup

Navitaire Inc v Easyjet Airline Co. and BulletProof Technologies, Inc. wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Global serializability wikipedia , lookup

Functional Database Model wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational model wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
An Architecture for Homogenizing
Federated Databases
Kamalakar Karlapalem
Qing Li
Chung-Dak Shum
Technical Report HKUST-CS94-13
July 1994
1
An Architecture for Homogenizing Federated Databases
Kamalakar Karlapalem
Qing Li
Chung-Dak Shum
Department of Computer Science
Hong Kong University of Science and Technology
Clear Water Bay, Kowloon, Hong Kong
e-mail: {kamal, qing, shum}@cs.ust.hk
Abstract
Many large organizations have a number of heterogeneous localized databases
that are maintained and managed to support a set of local applications. One of the
main difficulties in supporting global applications over a number of localized
databases and migrating legacy ISs to modern computing environment is to cope
with the heterogeneities of these systems. We propose a novel flexible architecture
(HODFA) to dynamically connect these localized heterogeneous databases in
forming a homogenized federated database system and to support the process of
transforming a collection of heterogeneous information systems onto a homogeneous environment. We further develop an incremental methodology of homogenization in the context of our HODFA framework, which can facilitate different
degrees of homogenization in a stepwise manner.
1 Introduction and Motivation
In many large organizations a number of localized databases are maintained and managed to support a set
of applications. Typically, these localized databases are owned by the departments or sections of an organization and are autonomous. Moreover, these local databases are heterogeneous, that is, managed by different database management systems (DBMS) based on different models (like hierarchical, network,
relational, object oriented). With the organizational data spread over various parts of the organization there
is a need to provide a common platform over these localized databases that support global data processing
applications. These applications cater to maintaining consistency of local databases, generating consolidated global reports, supporting local and global on-line transaction processing and decision support systems.
2
Information systems (IS) came into existence long before the introduction of any DBMS. These ISs may
consist of a large collection of programs and data, usually written in COBOL, and use a legacy database
service, for example, IBM’s IMS. They are important assets and are critical for the day-to-day operation of
an organization. Today, these legacy information systems pose one of the most difficult information management problems for many large organizations. The cost of maintaining a legacy IS often takes up a major
portion of all the IS resources. Lack of documentation, inflexibility in the design, poor performance, inappropriate functionality all attribute to the high cost. The lack of understanding of a legacy IS also makes it
difficult for organizations to take full advantage of newer technologies, such as client-server architectures,
and current software, such as relational DBMS. The migration of legacy IS to very flexible modern computing environment is an important undertaking that we will address in this paper.
One of the main difficulties in supporting global applications over a number of localized databases and
migrating legacy ISs to modern computing environment is to cope with the heterogeneities of these systems. Each database or IS may be based on a different model and each system may provide its own set of
services for supporting its individual applications. These systems are not originally designed to facilitate
any cooperation and there is no general model for interoperability among such isolated software systems.
Should these systems exist under a homogeneous environment, global applications built upon a common
set of tools and services can be developed much more efficiently. Legacy ISs in a homogeneous environment will also be much simpler, easier to understand and more readily to be migrated to better computing
environment in future. In this paper, we explore the concept of homogenization: the process of transforming a collection of heterogeneous systems onto a homogeneous environment.
We propose a novel flexible architecture to dynamically connect these localized heterogeneous databases
in forming a homogenized federated database system to support the above mentioned applications. This
architecture is based on the concept of homogenization of federated databases by managing and maintaining a mirror copy of each of the localized databases (as a member database) in a single robust DBMS. By
interconnecting and homogenizing localized databases, we want to achieve the following specific goals:
(1) to provide the ability to streamline the replacement of the legacy localized databases in a graceful manner by a member database system, so that existing applications on the localized databases will not be
affected during the transition, and they get gradually migrated on to the homogenized federated database
system; (2) new global applications at different levels of abstraction and scale can be developed on top of
the homogenized federated databases, and provide a flexible and extensible system that facilitates efficient
3
global/distributed constraints derivation and enforcement; (3) provide interoperability among a set of heterogeneous databases in a practical and economic way, so that previously isolated heterogeneous localized
databases can be loosely coupled and become interoperable in a homogenized federation of databases in
which data/information can be integrated and shared as and when needed.
The rest of the paper is organized as follows. In Section 2 we present an overview of the proposed architecture (HODFA) by describing the various components and their inter-relationships and/or interactions, and
evaluate this architecture with respect to the above-mentioned objectives. In Section 3 we note that various
level of homogenization can be achieved. We elaborate on a practical incremental methodology of conducting the homogenization process within HODFA framework, and examine the issues of supporting
migration and development of applications. In Section 4, we compare our approach with others work. We
conclude this paper with a summary and several “open issues” in Section 5.
2 An Overview of the Architecture
Our approach for homogenization of disparate databases is supported by a versatile architecture shown in
Figure 1. This architecture, termed as HODFA (for HOmogenized Database Federation Architecture), consists of a hybrid of preexisting and new ISs. We shall first briefly introduce the various components of this
architecture by describing their functionality and interrelationships. We will then evaluate this architecture
with respect to our goals.
2.1 Components of HODFA
As shown in Figure 1, HODFA is composed of a set of localized databases managed by pre-existing local
database systems (LDS), a set of local applications accessing the local databases, a set of member databases that are “mirror copies” of localized databases, a set of applications (MASS) that manage the data
movement between localized and member databases, a coordinator DBMS with multilevel materialized
view support system, a case base (CB) and a system data/knowledge directory (SD). These are detailed
immediately below.
4
2.1.1 Localized databases
Localized databases are a set of preexisting databases each of which is managed by a local database management system. These databases support a set of localized applications for the individual department and
section use, and are totally autonomous and possibly heterogeneous (either in terms of hardware platform
Figure1. HODFA - HOmogenized Database Federation Architecture
A suite of Heterogeneous Database Applications
Ad-hoc Queries & Decision Support Applications
Local and Global
Member Database
Applications
LOCAL
APPLICATIONS
Multilevel Materialized View
Support System
COORDINATOR DATABASE MANAGEMENT SYSTEM
MASS2
MASS1
LDS
MASS
LDS
CB
LDS
SD
Directory &
Case base
Member
Databases
Views
Localized Databases
or data model and languages used). As indicated in section 1, an important goal of HODFA is to convert
these disparate databases into homogenized, federated databases in a graceful manner, so that they can be
used to support not only existing localized applications but also different kinds of (global) applications. To
this end, the following components are deemed to be appropriate and useful.
5
2.1.2 Member databases
Member databases are derived databases through what we call “mirroring processes” (see below). In particular, each member database is a mirror copy of a single localized database. The purpose of the mirroring
process is to convert the localized heterogeneous databases into a set of homogeneous databases which can
be efficiently managed by a single robust DBMS (namely, the coordinator DBMS). Depending on the
requirements of the global applications and the sensitivity of the data in localized databases, a member
database (in the coordinator DBMS) may be created through mirroring its corresponding localized database, either as a full copy of data, or partial copy of data, or a view of the data, or only just the schema definition with no data. Note that if all the member databases consist of only schema definitions of localized
databases and no data, then we have the classical federated database system architecture. If all the member
databases are replicated consistent full copies of localized databases, then we have homogenized the localized (heterogeneous) databases. In all other scenarios we have some degree of homogenization of the
localized databases. In general member databases can be populated by the mirroring application support
system as described below, and will be accessed by a set of applications. Some of these applications emulate the processing done by the localized applications accessing the corresponding localized database. This
way it is possible to replace an outdated (legacy) localized database by a consistent and fully replicated
mirror copy (namely, the corresponding member database).
2.1.3 Mirroring application support system (MASS)
In HODFA, the process of homogenizing localized heterogeneous databases into member databases is conducted by tailored (customized) Mirroring Application Support Scheme (MASS) components. Each
MASS component consists of a set of applications to extract data from a localized database and populate a
member database and vice versa. In case of full or partial homogenization (of localized databases), the
MASS components will also be responsible for maintaining the consistency between the localized databases and their corresponding member databases. There can be potentially many MASS modules moving
data among localized and member databases. Depending on the needs, the mirroring/homogenizing processes can be conducted in various ways, including (i) to extract the relevant data from localized database
into an intermediate medium (like tapes) and populate the member database from this intermediate
medium, (ii) to use the localized applications to update both the localized database and its corresponding
member database simultaneously, (iii) to employ a classical federated database management system to sup-
6
port maintenance of both localized and member databases, (iv) to populate the member databases from the
manual records.
2.1.4 Coordinator DBMS
Just as a distributed DBMS provides integration without centralization [Ozsu91]; a coordinator DBMS
provides for the conceptualizing of member databases as a centralized set of federated homogenized databases with a flexibility for integration. The coordinator DBMS is a single robust DBMS that supports,
manages and maintains the member databases. Moreover, as a DBMS the coordinator DBMS provides
capabilities like transaction management services, recovery, and access to multiple member databases
(interoperability). This is a major advantage for having a coordinator DBMS because lack of advanced
transaction management services and interoperability has impeded the development of full fledged heterogeneous database system based solution. Moreover, due to the local autonomy constraints, supporting
recovery in heterogeneous database systems is a very difficult problem in contrast against our approach
wherein the support for recovery is packaged along with the coordinator DBMS. Since the schemas for all
member databases are defined using the same data model (on which coordinator DBMS is based) the problem of schema integration reduces to resolving semantic conflicts between member database schemas.
The heterogeneous database system technology has not advanced so far as to provide a set of tools for
managing and designing databases, and providing utilities and languages for developing and maintaining
applications [Ozsu91]. Whereas a coordinator DBMS (like anyone of the top commercial relational
DBMSs) comes packaged with all such tools and utilities that support development of new applications
and databases. The coordinator DBMS provides the access to the member databases for the MASS and all
the applications including (emulated) localized and newly developed (global) applications. The coordinator DBMS is also responsible for supporting the maintenance of global consistency between the member
databases, and identification of the inconsistencies in the localized databases.
2.1.5 Local Applications
Each of the localized databases has a set of applications that access it. These local applications are processed at the sites of the localized databases. Therefore, it may not be possible to execute these applications
on the coordinator database system so as to access the corresponding member database. Hence these local
applications have to be migrated on to the coordinator database system. Migration of local applications on
7
to coordinator DBMS can be achieved by: (1) rewriting the code by changing just the database access
statements in the applications (this is possible if both localized DBMS and centralized DBMS are based on
same data model (say relational)), (2) designing, implementing and testing new applications from scratch,
and (3) emulating the local applications by a combination of 4GLs and user interface. Note that the migration can be done only after the member databases are designed, created, and populated. If the member databases are not full mirror copies (see Section 3 for no and partial homogenization) of the localized databases
then local applications cannot be migrated to the coordinator DBMS. In this case the local applications
may have to become distributed applications in order to maintain the data consistency.
2.1.6 Miscellaneous Components and Their Interactions
There are some other components used by HODFA to support miscellaneous functions, which also interact
with the above components. Specifically, there is a system data/knowledge directory (SD) used to maintain
consolidated information about the processing requirements of applications on the member databases, and
facilitate dynamic schema integration. This is done by the coordinator DBMS with the input from the database designer. A case base (CB) is used to store the access plans of frequently issued ad-hoc queries or
transactions. This can enable the coordinator DBMS to efficiently support these queries and transactions.
Also the member database design can be facilitated by using the access semantics for the database operations issued by the applications. In case the coordinator DBMS is a distributed database system (see
below), then the information in the system directory and case base could be used for distributed database
design to efficiently process the applications. This is especially useful for supporting distributed OLTP
applications.
Within the coordinator DBMS, a view mechanism is provided to support different kinds of applications
with different levels of abstraction. The coordinator DBMS supports the specification and maintenance of
materialized views, which enables the coordinator DBMS to efficiently execute decision support applications and ad-hoc queries. The support for materialized views and their use by the decision support applications has not been investigated by the current heterogeneous database systems. This is because of the
difficulty in supporting materialized views in heterogeneous database system environment. As HODFA
homogenizes the heterogeneous localized databases under a single DBMS it is feasible to support such
kinds of applications based on materialized views.
8
2.2 Evaluation of the Architecture
We shall now evaluate the proposed architecture with respect to our general objectives listed in Section 1,
namely, replacement of the legacy database systems based on this architecture, support for the new global
applications, and to facilitate interoperability. We also consider the feasibility aspects for the implementation of HODFA.
Legacy database systems replacement:
The way HODFA supports the mechanisms to homogenize localized heterogeneous databases into uniformed member databases brings up a novel feature of the system, namely, it is now possible for any of the
outdated/legacy databases to be upgraded (and ported) to more advanced data model (and hardware platform), and be able to eventually get replaced by its corresponding member database if it is fully homogenized. Obviously it will be more desirable (and for some applications extremely necessary) for the existing
localized applications to be able to continue function during the upgrade and replacement period. This calls
for a practical methodology (see Section 3) for conducting the homogenization of the localized databases
properly, which can enable the system to accommodate both existing localized applications and newly
developed global applications at all times.
Support for new global applications:
The architecture set up for HODFA is not only suitable for replacing legacy localized database systems by
member databases, it is also flexible and extensible in supporting different kinds of distributed/global
applications on top of it. In particular, with a multilevel materialized view support system to be incorporated by the coordinator DBMS (cf. Figure 1), HODFA allows a wide range of applications to be developed using a view mechanism. A materialized view support system can accommodate both localized and
global/distributed applications at different levels of abstraction. With the help of the case base (CB) and the
system data/knowledge directory (SD), it is possible for HODFA to derive global/distributed constraints by
employing some efficient machine learning techniques, and maintain such constraints through interactions
with individual member databases.
9
Facilitate interoperability:
The feasibility of devising the desired level of interoperability between the localized database system and
the coordinator DBMS depends on several aspects. Besides the necessary networking facilities (for interconnectivity), other issues need to be resolved include the resolution of system and semantic heterogeneity,
the derivation and/or integration of schemas and views, and of course, the successful implementation of the
system in terms of its key components and their interactions. On the first two issues, there have been “standard” algorithms and techniques resulted from the past two decades’ research [Batini86, Ceri86], which
are both efficient and robust, and are readily available so that they can be applied to our system. Once the
homogenization of disparate localized databases is completed, the problem of facilitating interoperability
becomes simple as all the member databases can now be managed by a single robust coordinator DBMS.
Feasibility for implementing HODFA:
As for the feasibility of implementing the system, the key components involved are obviously the coordinator DBMS and the MASS mechanism, with the latter mainly relying on schema conversion and program
translation techniques. For the coordinator DBMS, there is a choice among some of the robust and powerful DBMSs such as relational and/or object-oriented ones; further, there is also an orthogonal issue of
whether to use a centralized or distributed DBMS for the coordinator DBMS. This depends a lot on the
scope and the size of the applications to be supported. Fortunately, both centralized and distributed DBMSs
supporting either relational or object-oriented models are commercially available now, and most of these
systems can be run on both PCs and mainframes.
3 Fundamentals of Homogenization
Given a set of heterogeneous localized databases, there are many aspects involved in transforming them
into a homogeneous environment (the process of this transformation is known as homogenization).
Homogenization can be conducted at different levels and in different ways. In this section, we classify different ways of homogenization, and then focus on HODFA’s “incremental” approach of homogenization,
which upgrades the local databases (LDs) into member databases (MDs) through mirroring. We also elaborate on the potential impact of this approach on both existing and new applications.
10
3.1 A Classification of Homogenization
Intuitively, homogenization is a process that creates and maintains a mirror copy of a localized database as
a member database in a federated database system. The resultant database federation system can thus work
in a “homogeneous” environment, supporting different kinds of applications (both existing and new ones)
accessing the member databases. Since homogenization is a process, it can be accomplished at different
degrees. There can be zero-degree homogenization (no-homogenization), i.e. only localized database
schema is defined in the coordinator database, but the member database does not contain any data. At the
other end we can have full homogenization, wherein the member database is a replicated consistent copy
of the localized database. In all other cases we have some partial-degree of homogenization (these include
the case when member database schema is a view on top of localized database). A classification of various
degrees of homogenization are shown in Table 1.
Table 1: Nomenclature for degrees of homogenization
DEGREE OF HOMOGENIZATION
zero-degree
(schema-only)
partial-degree
(some replication)
full-degree
(full replication)
Homogenized
Federated
Schemas
SemiHomogenized
Databases
Homogeneous
Databases
Manner of Homogenization: A mechanism is needed by which the data gets populated into a member database from the corresponding localized database. This can be done in a stepwise manner, or at one-time.
Any degree of homogenization can be implemented by appropriately defining the (sub-)schema/view on
the localized database as the member database schema, and then possibly populating this member database. This is known as one-time homogenization, and any change in homogenization would require redefining the member database schema (and populating it in non-zero degree cases). Stepwise homogenization
consists of steps to gradually homogenize the localized databases, which again can range from no-homogenization to full homogenization. Clearly, stepwise homogenization takes longer time from the viewpoint
of homogenizing legacy database systems, but can be more flexible and desirable from the perspective of
11
supporting existing and new applications simultaneously. In this paper, our focus is on stepwise, partial or
full degree, homogenization methodology, which is the approach taken by HODFA as described below.
3.2 HODFA’s Incremental Homogenization Approach
As mentioned earlier, one of the main goals to homogenize local databases is to facilitate a “graceful” transition and (eventual) replacement of the legacy systems. It would be desirable if during the transition,
existing local applications will not be affected (or only to a minimum extent). In HODFA, an incremental
homogenization methodology is devised for this purpose, which facilitates the following features and
capabilities:
(1) Co-existence of Member and Localized Databases:
Generally speaking, it is almost impossible to have all the localized databases and their applications to be
transformed and ported at one time to member databases under the coordinated DBMS without affecting
the applications. It is therefore desirable for the transition to be taken place in a gradual, piecemeal fashion.
Thus there shall be populated localized and member databases being accessed by the applications, hence
co-existence of these localized and member databases becomes a paramount issue. Note that some data can
be common to both member databases and localized databases, whereas some data may only exist in localized database or member database. The data that is common to both member and localized databases needs
to be kept consistent. Also as there may be applications that need to access both localized and member
databases, interoperability among member and localized database must be supported (which is supported
by the MASS system). Another problem is “how does one specify the degree of homogenization?” or keep
track of amount of homogenization that has already taken place. There are two approaches: i) application
based: move each application and all the data that is relevant to it from localized database to member database, ii) database object based: move each object of database (like relation, entity) from the localized database to the member database. The homogenization is complete once all the applications or complete set of
database objects have been moved from localized to member databases.
(2) Flexibility of the Mirroring Process:
For homogenization to take place, it is the function of MASS -- a middle layer of software between the
localized databases (LDs) and member databases (MDs) -- to move the data from LDs to MDs. Ideally,
12
data movement is from localized to member databases only, but because of the fault-tolerant nature of the
HODFA it is required to provide data movement from the MDs to LDs as well. As HODFA aims to support
homogenization from zero-degree homogenization to full homogenization, MASS has to be a flexible system. The issues for zero-degree homogenization are totally different from issues for partial/full homogenization. In case of zero-degree homogenization the interoperability between the localized databases and
member databases is the main issue, whereas for partial/full homogenization the consistency between
localized databases and member databases (and among member databases) is the main issue. For the
former, there are standard techniques for supporting interoperability between heterogeneous databases
(like, RDA (remote data access) and OPEN SQL standards), therefore it is possible to support interoperability between (and facilitate access to) the localized databases. For the latter, as all the member databases
are managed by the same DBMS (viz., the coordinator DBMS) it is feasible to maintain the consistency
between member databases, and detect inconsistencies among localized databases.
(3) Relocation of Localized Applications:
A major problem with homogenization of localized databases is the support for the relocation of applications from localized database systems to the coordinator DBMS. This relocation of localized applications
is also needed to maintain the consistency between localized databases and member databases during and
after homogenization. There are many ways through which applications can be moved from localized databases systems to coordinator DBMS:
•
Rewrite the applications on top of the coordinator DBMS. If there are a large number of localized
applications that need to be moved then software filters can be developed to facilitate this movement. There are also filters available to transform code from one language to an other language
(like from PASCAL to ‘C’, or FORTRAN to ‘C’, etc.).
•
Employ software re-engineering techniques. There are software re-engineering techniques that
facilitate program conversion, evaluation of the database requirements of the applications, and
detection of inherent constraints encoded in the applications (see [Rugaber90]). These techniques
not only facilitate relocation of applications from LDS to coordinator DBMS, but also provide
inputs for member database design.
13
•
Use advanced application program generators to emulate the localized appl2ications. Most
DBMSs provide an application development environment based on CASE that facilitates fast
application development and testing. This application development environment can be utilized to
implement the localized applications on the coordinator DBMS. One major advantage of this
approach is that the consistency criteria (that is, the constraints that need to be satisfied by the
member databases) can be specified declaratively as part of the application. Thus any inconsistencies between the consistency criteria among member databases can be detected.
So far we have elaborated on HODFA’s incremental approach towards homogenization, and examined the
issues involved and features needed in supporting this approach. Such an approach allows more flexibility
and in practice is more cost-effective (at least from applications’ viewpoint). From implementation viewpoint, incremental homogenization should also be more feasible and economic, and it can facilitate backup
and recovery in cases of accidents and/or system failure. From application viewpoint, such an incremental
approach is also quite desirable on a number of aspects, as discussed below.
Migrated local
applications
New (distributed) applications
AS1’
Coordinated Database Management System
AS2’
ASn’
MD1
MD2
...
MDk
Migration Process
Mirroring Application Support Systems
Localized Applications
AS1
AS2
ASn
MASS1
MASS2
...
LD1
LD2
...
MASSn
...
LDn
Figure 2. Migrating Localized Applications along with Database Mirroring
14
3.3 Impact on the Applications.
Eventually, when all localized databases are homogenized to member databases which are managed by a
single robust DBMS (i.e., the coordinator DBMS), all applications also become operational on top of the
coordinated DBMS. There are two kinds of applications that can be identified here: those that correspond
to existing (localized) applications, and those that are newly developed (which can be potentially distributed applications). For the former, continuous, uninterrupted operations are guaranteed by the incremental
homogenization approach, in which the legacy systems are ported by MASS into homogenized member
databases in a piecemeal fashion, and applications running on the localized databases can thus be migrated
gradually in a similar manner. Figure 2 intuitively illustrates the processes. This feature can be particularly
important to those organizations which can not afford to stop their local applications even for a small duration of time.
As the localized databases within the same organization are integrated into a homogenized database federation, it also creates a great opportunity for the organization to develop a wide range of new applications
(both centralized and distributed ones) on top of the federation. Let us consider some specific types of
applications HODFA is targeted to support on top of the coordinated DBMS. As depicted in Figure 1, these
include applications like decision support systems (DSS) that mainly access the database federation
through the multilevel materialized views. (If there were no materialized views then these applications
would have to generate the consolidated view for each execution of the application thus slowing down the
applications tremendously). The next set of applications are the ad-hoc queries; these queries facilitate the
identification of similar data objects between member databases, and they accommodate dynamic schema
integration. The access plans for the ad-hoc queries can be stored in a case base and used to increase the
efficiency of the ad-hoc queries. The OLTP applications cater to the day to day access and update of the
member databases, which call for fast response time and updateable views. Finally we have a set of global
and local member database applications that emulate (part of) the localized database applications, or new
local member database applications, or new global database applications that access more than one member database. Some of these global member database applications maintain the consistency between member databases and identify inconsistencies in localized databases.
Besides an extended scope and high-level new applications that can be supported, our approach can also
facilitate a powerful feature called “semantic relativism” [McLeod80], meaning that different applications
15
can hold different views/interpretations on the data of the same (federated) system. For example, DSS type
of applications may wish to keep semantic relativism in interpreting the data and predicting changes, and
emulated local MD applications may need to exercise application autonomy during (and even after) the
transition from the homogenization. In HODFA, this will be made possible by its multilevel materialized
view mechanism. Further, the homogenized database federation can also maintain so-called “application
autonomy” [Sheth90] existed in the previously localized legacy systems; such local autonomy can be quite
important and essential to support for not only old applications but also many of the new applications.
4 Related Work
There has been a good deal of research conducted on “distributed” database systems since 70’s. The bulk
of these systems focused on physically distributed data but conceptually centralized representation of data,
thus there was normally a single global schema maintained, with central control exercised upon the global
applications (see, for example, [Rothnie77,Stonebraker77,Ceri84,Batini86]). Though the databases could
be distributed into different locations with possibly different data models and even different hardware platforms used, local autonomy was not possible (or possible only to a very limited extent). These characteristics also suggest that such traditional distributed DBMSs are best suited to traditional types of applications
that exhibit centralized control in top-down fashion, but they fall short to support those that are logically
decentralized in nature, with a set of databases existing already that are maintained by different departments/units in a somewhat autonomous/independent manner.
A more recent different type of architecture that was proposed to surmount the problems of traditional logically centralized databases is the federated database system architecture [Heimbigner85-1,Sheth90-1]. In
a federated database system, there is no global schema and no central coordinating authority so that local
autonomy and heterogeneity can be maintained by local databases that participate in the federation. Data
sharing is normally by explicit import/export of data among local databases in a negotiated manner. This
kind of systems are particularly suitable for distributed applications which are decentralized in nature, and
there is a need to access data/information from databases which are connected by a network. This decentralized nature of federated database systems offers many advantages, and is able to overcome some of the
most notable drawbacks of logically centralized databases. But it is difficult to exercise global control and
16
to conduct operations globally due to its strict adherence to local autonomy and heterogeneity for federated
databases.
Compared with these two extremes, the framework in which HODFA belongs to is somewhat in between,
as is evident from the following perspectives: (1) the HODFA architecture is both centralized and decentralized at logical level, since there are legacy localized databases which work independently from each
other, yet at the same time a single robust DBMS (viz., the coordinator DBMS) is used to manage all the
member databases; (2) while the system still respects the local autonomy inherited from the localized databases, the coordinator DBMS is able to exercise some global control over its member databases, and new
global applications can be developed on top of it; (3) there are other unique characteristics about the
homogenization process which is related to, but different from the issues such as schema integration and
interoperability which are central to distributed and/or federated database systems. Below we elaborate the
last point in more detail.
Homogenization vs. Schema Integration: Schema integration is a process of generating a global schema by
resolving the conflicts between a set of distributed database schemas [Batini86]. All global applications
access the data through the global schema, with the distributed database management system (DDBMS)
providing the necessary support for accessing/modifying the local (heterogeneous) databases. There is no
emphasis on homogenization of the disparate/heterogeneous distributed databases. Unlike the general
schema integration problem, HODFA does not impose any restrictions on the applications data accesses.
Even though schema integration does not entail homogenization, HODFA facilitates schema integration as
all the schemas are specified using the logical data model provided by the coordinator DBMS. In case of
schema integration, each of the heterogeneous local database schema may need to be transformed using a
common data model before the actual integration takes place. Since the schema integration does not imply
data integration, the support for features like multi-level materialized view system is generally not provided.
Homogenization vs. Interoperability: Interoperability is a function of a federated database system which
specifies the amount of cooperative database access that is possible between two participating database
systems [IMS91, Bukhres93]. Interoperability enables an application to access data from different heterogeneous database systems. Interoperability enables homogenization of federated databases, and homogenization in HODFA implies (to a great extent) interoperability. But interoperability is not the only aspect of
17
which homogenization can be supported. Interoperability is necessary in HODFA to maintain the consistency between member and localized databases; it also enables applications to access both localized databases and member databases.
Neither schema integration nor interoperability facilitate replacement of the localized databases by the
member databases. This is a major advantage in homogenizing localized databases as it facilitates gradual
replacement of the legacy databases. With the multi-level materialized view system being supported by the
coordinator DBMS, homogenization of localized databases facilitates simpler specification, creation, and
maintenance of the views in HODFA. [Brodie93] in their methodology for migrating legacy information
systems investigate the general framework for incremental migration, but do not consider the concept of
homogenization of heterogeneous localized databases. Since their methodology is very general, it is complicated both from architecture and methodology point of view, also their methodology cannot take advantage of the homogenized federated database system concept. Moreover, the maintenance of the consistency
in their approach will be much more complicated and their approach does not support the concept of materialized multilevel views to facilitate development of new global and local applications.
5 Concluding Remarks and Future Work
We have presented an architecture that facilitate dynamic evolution of localized heterogeneous database
into a homogenized database federation. The approach we are taking is based on the concept of homogenization -- a process of transforming localized databases into homogeneous member databases via mirroring.
An incremental methodology of homogenization is proposed in the context of our HODFA framework,
which can facilitate different degrees of homogenization in a stepwise manner. Some notable characteristics and possible features of this approach are:
•
ability to streamline the replacement of the legacy localized databases by the member database.
This is a novel feature of this approach, as the localized databases have been ported onto a coordinator DBMS and maintained as member database, we can at any time replace the localized database with the member database under the same DBMS. Also in case of failure to access either
localized database or the member database, the applications will have one more database to access
the data. This also provides fault tolerance, as the applications in case of failure to access one database (localized or member) can access the other database.
18
•
providing a robust coordinator database management system as a front-end to the heterogeneous
localized databases. This reduces the complexity of the interoperability issues between a set of heterogeneous databases.
•
providing the support for managing and maintaining a mirror copy of the localized database at logical and/or physical levels as member databases. The applications on the localized databases may
be ported on the coordinator DBMS platform. Some of the applications can be used to maintain the
consistency between the localized and member databases. This consistency could be achieved with
in a time bounded delay depending on the techniques used. Due to the changes taking place in the
organization, the localized databases can be dynamically added and/or deleted from the coordinator DBMS as member databases.
•
ability to dynamically integrate the member database schemes based on application semantics.
This approach does not enforce any upfront integration of all the member databases. But will facilitate need based integration of the member database schemas by connecting conceptually similar
data objects.
While the incremental approach is advocated by HODFA, we must point out that ultimately it is the users
and/or the database administrators who decide whether or not to adopt this methodology. Even if the incremental homogenization strategy is chosen, there may be different strategies to execute incremental homogenization. For example, they may use mixed approach (partly application based, partly database object
based) or an ad-hoc approach or one that is based on the external factors (like relative importance of application and/or databases to the organization). This partly reveals the experimental nature of this research
project. The HODFA system will be implemented by employing several DBMSs on a network of Sun4
workstations. Some of the implementation issues are:
•
development of incremental homogenization of heterogeneous databases.
•
maintaining consistency between localized and member databases.
•
dynamic schema integration to enable efficient processing of applications, and maintenance of
member databases.
•
monitoring and management of the homogenized federated database system.
19
•
defining multilevel materialized views to cater to the data requirements of decision support systems.
We are currently collaborating with a telecommunication company to migrate their legacy information systems to a homogenized environment using this methodology.
References
[Batini86]C. Batini, M. Lenzerni and S.B. Navathe. A comparative Analysis of Methodologies for
Database Schema Integration. ACM Computing Surveys, 18(4), pp 323-364, 1986.
[Brodie93]M. L. Brodie and M. Stonebraker. DARWIN: On the Incremental Migration of Legacy
Information Systems. TR-0222-10-92-165, GTE Laboratories, Inc. March 1993.
[Bukhres93]O.A. Bukhres, J. Chen, W. Du and A.K. Elmagarmid. InterBase: An Execution Environment for Heterogeneous Software Systems. IEEE Computer, 26(8), pp 57-69, 1993.
[Ceri84]S. Ceri and G. Pellagatti. Distributed Databases: Principles and Systems, McGraw-Hill,
New York, 1984.
[Heimbigner85]D. Heimbigner and D. McLeod. A federated database architecture ACM Trans. on
Office Information Systems, 3(3), pp 253-278, 1985.
[IMS91]Y. Kambayashi, M. Rusinkiewicz and A. Sheth. Proceedings of the 1st Int’l Workshop on
Interoperability in Multidatabase Database Systems, IEEE Computer Society, 1991.
[McLeod81]D. McLeod and J.M. Smith. Abstractions in Databases. In Proc. Workshop on Data
Abstraction, Databases and Conceptual Modeling, Pingree Park, CO, June 1980.
[Ozsu91] M. T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall
publishers, 1991.
[Rothnie77]J.B. Rothnie and N. Goodman A survey of research and development in distributed database management. In Proceedings of 2nd VLDB Conference, 1977.
[Rugaber90] S. Rugaber, S. B. Ornburn and R. J. LeBlanc Jr. Recognizing design decisions in programs, IEEE Software vol. 7, no. 1 (jan 1990) pp 46-54.
[Sheth90]A. Sheth and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3), pp 183-236, 1990.
[Stonebraker77]M.R. Stonebraker and E. Neuhold. A distributed database version of INGRES. In
Proceedings of Berkeley Workshop on Distributed Data Management and Computer Networks, pp.19-36, 1977.
20