Download An Architecture for Homogenizing Federated Databases

An Architecture for Homogenizing Federated Databases Kamalakar Karlapalem Qing Li Chung-Dak Shum Technical Report HKUST-CS94-13 July 1994 1 An Architecture for Homogenizing Federated Databases Kamalakar Karlapalem Qing Li Chung-Dak Shum Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong e-mail: {kamal, qing, shum}@cs.ust.hk Abstract Many large organizations have a number of heterogeneous localized databases that are maintained and managed to support a set of local applications. One of the main difficulties in supporting global applications over a number of localized databases and migrating legacy ISs to modern computing environment is to cope with the heterogeneities of these systems. We propose a novel flexible architecture (HODFA) to dynamically connect these localized heterogeneous databases in forming a homogenized federated database system and to support the process of transforming a collection of heterogeneous information systems onto a homogeneous environment. We further develop an incremental methodology of homogenization in the context of our HODFA framework, which can facilitate different degrees of homogenization in a stepwise manner. 1 Introduction and Motivation In many large organizations a number of localized databases are maintained and managed to support a set of applications. Typically, these localized databases are owned by the departments or sections of an organization and are autonomous. Moreover, these local databases are heterogeneous, that is, managed by different database management systems (DBMS) based on different models (like hierarchical, network, relational, object oriented). With the organizational data spread over various parts of the organization there is a need to provide a common platform over these localized databases that support global data processing applications. These applications cater to maintaining consistency of local databases, generating consolidated global reports, supporting local and global on-line transaction processing and decision support systems. 2 Information systems (IS) came into existence long before the introduction of any DBMS. These ISs may consist of a large collection of programs and data, usually written in COBOL, and use a legacy database service, for example, IBM’s IMS. They are important assets and are critical for the day-to-day operation of an organization. Today, these legacy information systems pose one of the most difficult information management problems for many large organizations. The cost of maintaining a legacy IS often takes up a major portion of all the IS resources. Lack of documentation, inflexibility in the design, poor performance, inappropriate functionality all attribute to the high cost. The lack of understanding of a legacy IS also makes it difficult for organizations to take full advantage of newer technologies, such as client-server architectures, and current software, such as relational DBMS. The migration of legacy IS to very flexible modern computing environment is an important undertaking that we will address in this paper. One of the main difficulties in supporting global applications over a number of localized databases and migrating legacy ISs to modern computing environment is to cope with the heterogeneities of these systems. Each database or IS may be based on a different model and each system may provide its own set of services for supporting its individual applications. These systems are not originally designed to facilitate any cooperation and there is no general model for interoperability among such isolated software systems. Should these systems exist under a homogeneous environment, global applications built upon a common set of tools and services can be developed much more efficiently. Legacy ISs in a homogeneous environment will also be much simpler, easier to understand and more readily to be migrated to better computing environment in future. In this paper, we explore the concept of homogenization: the process of transforming a collection of heterogeneous systems onto a homogeneous environment. We propose a novel flexible architecture to dynamically connect these localized heterogeneous databases in forming a homogenized federated database system to support the above mentioned applications. This architecture is based on the concept of homogenization of federated databases by managing and maintaining a mirror copy of each of the localized databases (as a member database) in a single robust DBMS. By interconnecting and homogenizing localized databases, we want to achieve the following specific goals: (1) to provide the ability to streamline the replacement of the legacy localized databases in a graceful manner by a member database system, so that existing applications on the localized databases will not be affected during the transition, and they get gradually migrated on to the homogenized federated database system; (2) new global applications at different levels of abstraction and scale can be developed on top of the homogenized federated databases, and provide a flexible and extensible system that facilitates efficient 3 global/distributed constraints derivation and enforcement; (3) provide interoperability among a set of heterogeneous databases in a practical and economic way, so that previously isolated heterogeneous localized databases can be loosely coupled and become interoperable in a homogenized federation of databases in which data/information can be integrated and shared as and when needed. The rest of the paper is organized as follows. In Section 2 we present an overview of the proposed architecture (HODFA) by describing the various components and their inter-relationships and/or interactions, and evaluate this architecture with respect to the above-mentioned objectives. In Section 3 we note that various level of homogenization can be achieved. We elaborate on a practical incremental methodology of conducting the homogenization process within HODFA framework, and examine the issues of supporting migration and development of applications. In Section 4, we compare our approach with others work. We conclude this paper with a summary and several “open issues” in Section 5. 2 An Overview of the Architecture Our approach for homogenization of disparate databases is supported by a versatile architecture shown in Figure 1. This architecture, termed as HODFA (for HOmogenized Database Federation Architecture), consists of a hybrid of preexisting and new ISs. We shall first briefly introduce the various components of this architecture by describing their functionality and interrelationships. We will then evaluate this architecture with respect to our goals. 2.1 Components of HODFA As shown in Figure 1, HODFA is composed of a set of localized databases managed by pre-existing local database systems (LDS), a set of local applications accessing the local databases, a set of member databases that are “mirror copies” of localized databases, a set of applications (MASS) that manage the data movement between localized and member databases, a coordinator DBMS with multilevel materialized view support system, a case base (CB) and a system data/knowledge directory (SD). These are detailed immediately below. 4 2.1.1 Localized databases Localized databases are a set of preexisting databases each of which is managed by a local database management system. These databases support a set of localized applications for the individual department and section use, and are totally autonomous and possibly heterogeneous (either in terms of hardware platform Figure1. HODFA - HOmogenized Database Federation Architecture A suite of Heterogeneous Database Applications Ad-hoc Queries & Decision Support Applications Local and Global Member Database Applications LOCAL APPLICATIONS Multilevel Materialized View Support System COORDINATOR DATABASE MANAGEMENT SYSTEM MASS2 MASS1 LDS MASS LDS CB LDS SD Directory & Case base Member Databases Views Localized Databases or data model and languages used). As indicated in section 1, an important goal of HODFA is to convert these disparate databases into homogenized, federated databases in a graceful manner, so that they can be used to support not only existing localized applications but also different kinds of (global) applications. To this end, the following components are deemed to be appropriate and useful. 5 2.1.2 Member databases Member databases are derived databases through what we call “mirroring processes” (see below). In particular, each member database is a mirror copy of a single localized database. The purpose of the mirroring process is to convert the localized heterogeneous databases into a set of homogeneous databases which can be efficiently managed by a single robust DBMS (namely, the coordinator DBMS). Depending on the requirements of the global applications and the sensitivity of the data in localized databases, a member database (in the coordinator DBMS) may be created through mirroring its corresponding localized database, either as a full copy of data, or partial copy of data, or a view of the data, or only just the schema definition with no data. Note that if all the member databases consist of only schema definitions of localized databases and no data, then we have the classical federated database system architecture. If all the member databases are replicated consistent full copies of localized databases, then we have homogenized the localized (heterogeneous) databases. In all other scenarios we have some degree of homogenization of the localized databases. In general member databases can be populated by the mirroring application support system as described below, and will be accessed by a set of applications. Some of these applications emulate the processing done by the localized applications accessing the corresponding localized database. This way it is possible to replace an outdated (legacy) localized database by a consistent and fully replicated mirror copy (namely, the corresponding member database). 2.1.3 Mirroring application support system (MASS) In HODFA, the process of homogenizing localized heterogeneous databases into member databases is conducted by tailored (customized) Mirroring Application Support Scheme (MASS) components. Each MASS component consists of a set of applications to extract data from a localized database and populate a member database and vice versa. In case of full or partial homogenization (of localized databases), the MASS components will also be responsible for maintaining the consistency between the localized databases and their corresponding member databases. There can be potentially many MASS modules moving data among localized and member databases. Depending on the needs, the mirroring/homogenizing processes can be conducted in various ways, including (i) to extract the relevant data from localized database into an intermediate medium (like tapes) and populate the member database from this intermediate medium, (ii) to use the localized applications to update both the localized database and its corresponding member database simultaneously, (iii) to employ a classical federated database management system to sup- 6 port maintenance of both localized and member databases, (iv) to populate the member databases from the manual records. 2.1.4 Coordinator DBMS Just as a distributed DBMS provides integration without centralization [Ozsu91]; a coordinator DBMS provides for the conceptualizing of member databases as a centralized set of federated homogenized databases with a flexibility for integration. The coordinator DBMS is a single robust DBMS that supports, manages and maintains the member databases. Moreover, as a DBMS the coordinator DBMS provides capabilities like transaction management services, recovery, and access to multiple member databases (interoperability). This is a major advantage for having a coordinator DBMS because lack of advanced transaction management services and interoperability has impeded the development of full fledged heterogeneous database system based solution. Moreover, due to the local autonomy constraints, supporting recovery in heterogeneous database systems is a very difficult problem in contrast against our approach wherein the support for recovery is packaged along with the coordinator DBMS. Since the schemas for all member databases are defined using the same data model (on which coordinator DBMS is based) the problem of schema integration reduces to resolving semantic conflicts between member database schemas. The heterogeneous database system technology has not advanced so far as to provide a set of tools for managing and designing databases, and providing utilities and languages for developing and maintaining applications [Ozsu91]. Whereas a coordinator DBMS (like anyone of the top commercial relational DBMSs) comes packaged with all such tools and utilities that support development of new applications and databases. The coordinator DBMS provides the access to the member databases for the MASS and all the applications including (emulated) localized and newly developed (global) applications. The coordinator DBMS is also responsible for supporting the maintenance of global consistency between the member databases, and identification of the inconsistencies in the localized databases. 2.1.5 Local Applications Each of the localized databases has a set of applications that access it. These local applications are processed at the sites of the localized databases. Therefore, it may not be possible to execute these applications on the coordinator database system so as to access the corresponding member database. Hence these local applications have to be migrated on to the coordinator database system. Migration of local applications on 7 to coordinator DBMS can be achieved by: (1) rewriting the code by changing just the database access statements in the applications (this is possible if both localized DBMS and centralized DBMS are based on same data model (say relational)), (2) designing, implementing and testing new applications from scratch, and (3) emulating the local applications by a combination of 4GLs and user interface. Note that the migration can be done only after the member databases are designed, created, and populated. If the member databases are not full mirror copies (see Section 3 for no and partial homogenization) of the localized databases then local applications cannot be migrated to the coordinator DBMS. In this case the local applications may have to become distributed applications in order to maintain the data consistency. 2.1.6 Miscellaneous Components and Their Interactions There are some other components used by HODFA to support miscellaneous functions, which also interact with the above components. Specifically, there is a system data/knowledge directory (SD) used to maintain consolidated information about the processing requirements of applications on the member databases, and facilitate dynamic schema integration. This is done by the coordinator DBMS with the input from the database designer. A case base (CB) is used to store the access plans of frequently issued ad-hoc queries or transactions. This can enable the coordinator DBMS to efficiently support these queries and transactions. Also the member database design can be facilitated by using the access semantics for the database operations issued by the applications. In case the coordinator DBMS is a distributed database system (see below), then the information in the system directory and case base could be used for distributed database design to efficiently process the applications. This is especially useful for supporting distributed OLTP applications. Within the coordinator DBMS, a view mechanism is provided to support different kinds of applications with different levels of abstraction. The coordinator DBMS supports the specification and maintenance of materialized views, which enables the coordinator DBMS to efficiently execute decision support applications and ad-hoc queries. The support for materialized views and their use by the decision support applications has not been investigated by the current heterogeneous database systems. This is because of the difficulty in supporting materialized views in heterogeneous database system environment. As HODFA homogenizes the heterogeneous localized databases under a single DBMS it is feasible to support such kinds of applications based on materialized views. 8 2.2 Evaluation of the Architecture We shall now evaluate the proposed architecture with respect to our general objectives listed in Section 1, namely, replacement of the legacy database systems based on this architecture, support for the new global applications, and to facilitate interoperability. We also consider the feasibility aspects for the implementation of HODFA. Legacy database systems replacement: The way HODFA supports the mechanisms to homogenize localized heterogeneous databases into uniformed member databases brings up a novel feature of the system, namely, it is now possible for any of the outdated/legacy databases to be upgraded (and ported) to more advanced data model (and hardware platform), and be able to eventually get replaced by its corresponding member database if it is fully homogenized. Obviously it will be more desirable (and for some applications extremely necessary) for the existing localized applications to be able to continue function during the upgrade and replacement period. This calls for a practical methodology (see Section 3) for conducting the homogenization of the localized databases properly, which can enable the system to accommodate both existing localized applications and newly developed global applications at all times. Support for new global applications: The architecture set up for HODFA is not only suitable for replacing legacy localized database systems by member databases, it is also flexible and extensible in supporting different kinds of distributed/global applications on top of it. In particular, with a multilevel materialized view support system to be incorporated by the coordinator DBMS (cf. Figure 1), HODFA allows a wide range of applications to be developed using a view mechanism. A materialized view support system can accommodate both localized and global/distributed applications at different levels of abstraction. With the help of the case base (CB) and the system data/knowledge directory (SD), it is possible for HODFA to derive global/distributed constraints by employing some efficient machine learning techniques, and maintain such constraints through interactions with individual member databases. 9 Facilitate interoperability: The feasibility of devising the desired level of interoperability between the localized database system and the coordinator DBMS depends on several aspects. Besides the necessary networking facilities (for interconnectivity), other issues need to be resolved include the resolution of system and semantic heterogeneity, the derivation and/or integration of schemas and views, and of course, the successful implementation of the system in terms of its key components and their interactions. On the first two issues, there have been “standard” algorithms and techniques resulted from the past two decades’ research [Batini86, Ceri86], which are both efficient and robust, and are readily available so that they can be applied to our system. Once the homogenization of disparate localized databases is completed, the problem of facilitating interoperability becomes simple as all the member databases can now be managed by a single robust coordinator DBMS. Feasibility for implementing HODFA: As for the feasibility of implementing the system, the key components involved are obviously the coordinator DBMS and the MASS mechanism, with the latter mainly relying on schema conversion and program translation techniques. For the coordinator DBMS, there is a choice among some of the robust and powerful DBMSs such as relational and/or object-oriented ones; further, there is also an orthogonal issue of whether to use a centralized or distributed DBMS for the coordinator DBMS. This depends a lot on the scope and the size of the applications to be supported. Fortunately, both centralized and distributed DBMSs supporting either relational or object-oriented models are commercially available now, and most of these systems can be run on both PCs and mainframes. 3 Fundamentals of Homogenization Given a set of heterogeneous localized databases, there are many aspects involved in transforming them into a homogeneous environment (the process of this transformation is known as homogenization). Homogenization can be conducted at different levels and in different ways. In this section, we classify different ways of homogenization, and then focus on HODFA’s “incremental” approach of homogenization, which upgrades the local databases (LDs) into member databases (MDs) through mirroring. We also elaborate on the potential impact of this approach on both existing and new applications. 10 3.1 A Classification of Homogenization Intuitively, homogenization is a process that creates and maintains a mirror copy of a localized database as a member database in a federated database system. The resultant database federation system can thus work in a “homogeneous” environment, supporting different kinds of applications (both existing and new ones) accessing the member databases. Since homogenization is a process, it can be accomplished at different degrees. There can be zero-degree homogenization (no-homogenization), i.e. only localized database schema is defined in the coordinator database, but the member database does not contain any data. At the other end we can have full homogenization, wherein the member database is a replicated consistent copy of the localized database. In all other cases we have some partial-degree of homogenization (these include the case when member database schema is a view on top of localized database). A classification of various degrees of homogenization are shown in Table 1. Table 1: Nomenclature for degrees of homogenization DEGREE OF HOMOGENIZATION zero-degree (schema-only) partial-degree (some replication) full-degree (full replication) Homogenized Federated Schemas SemiHomogenized Databases Homogeneous Databases Manner of Homogenization: A mechanism is needed by which the data gets populated into a member database from the corresponding localized database. This can be done in a stepwise manner, or at one-time. Any degree of homogenization can be implemented by appropriately defining the (sub-)schema/view on the localized database as the member database schema, and then possibly populating this member database. This is known as one-time homogenization, and any change in homogenization would require redefining the member database schema (and populating it in non-zero degree cases). Stepwise homogenization consists of steps to gradually homogenize the localized databases, which again can range from no-homogenization to full homogenization. Clearly, stepwise homogenization takes longer time from the viewpoint of homogenizing legacy database systems, but can be more flexible and desirable from the perspective of 11 supporting existing and new applications simultaneously. In this paper, our focus is on stepwise, partial or full degree, homogenization methodology, which is the approach taken by HODFA as described below. 3.2 HODFA’s Incremental Homogenization Approach As mentioned earlier, one of the main goals to homogenize local databases is to facilitate a “graceful” transition and (eventual) replacement of the legacy systems. It would be desirable if during the transition, existing local applications will not be affected (or only to a minimum extent). In HODFA, an incremental homogenization methodology is devised for this purpose, which facilitates the following features and capabilities: (1) Co-existence of Member and Localized Databases: Generally speaking, it is almost impossible to have all the localized databases and their applications to be transformed and ported at one time to member databases under the coordinated DBMS without affecting the applications. It is therefore desirable for the transition to be taken place in a gradual, piecemeal fashion. Thus there shall be populated localized and member databases being accessed by the applications, hence co-existence of these localized and member databases becomes a paramount issue. Note that some data can be common to both member databases and localized databases, whereas some data may only exist in localized database or member database. The data that is common to both member and localized databases needs to be kept consistent. Also as there may be applications that need to access both localized and member databases, interoperability among member and localized database must be supported (which is supported by the MASS system). Another problem is “how does one specify the degree of homogenization?” or keep track of amount of homogenization that has already taken place. There are two approaches: i) application based: move each application and all the data that is relevant to it from localized database to member database, ii) database object based: move each object of database (like relation, entity) from the localized database to the member database. The homogenization is complete once all the applications or complete set of database objects have been moved from localized to member databases. (2) Flexibility of the Mirroring Process: For homogenization to take place, it is the function of MASS -- a middle layer of software between the localized databases (LDs) and member databases (MDs) -- to move the data from LDs to MDs. Ideally, 12 data movement is from localized to member databases only, but because of the fault-tolerant nature of the HODFA it is required to provide data movement from the MDs to LDs as well. As HODFA aims to support homogenization from zero-degree homogenization to full homogenization, MASS has to be a flexible system. The issues for zero-degree homogenization are totally different from issues for partial/full homogenization. In case of zero-degree homogenization the interoperability between the localized databases and member databases is the main issue, whereas for partial/full homogenization the consistency between localized databases and member databases (and among member databases) is the main issue. For the former, there are standard techniques for supporting interoperability between heterogeneous databases (like, RDA (remote data access) and OPEN SQL standards), therefore it is possible to support interoperability between (and facilitate access to) the localized databases. For the latter, as all the member databases are managed by the same DBMS (viz., the coordinator DBMS) it is feasible to maintain the consistency between member databases, and detect inconsistencies among localized databases. (3) Relocation of Localized Applications: A major problem with homogenization of localized databases is the support for the relocation of applications from localized database systems to the coordinator DBMS. This relocation of localized applications is also needed to maintain the consistency between localized databases and member databases during and after homogenization. There are many ways through which applications can be moved from localized databases systems to coordinator DBMS: • Rewrite the applications on top of the coordinator DBMS. If there are a large number of localized applications that need to be moved then software filters can be developed to facilitate this movement. There are also filters available to transform code from one language to an other language (like from PASCAL to ‘C’, or FORTRAN to ‘C’, etc.). • Employ software re-engineering techniques. There are software re-engineering techniques that facilitate program conversion, evaluation of the database requirements of the applications, and detection of inherent constraints encoded in the applications (see [Rugaber90]). These techniques not only facilitate relocation of applications from LDS to coordinator DBMS, but also provide inputs for member database design. 13 • Use advanced application program generators to emulate the localized appl2ications. Most DBMSs provide an application development environment based on CASE that facilitates fast application development and testing. This application development environment can be utilized to implement the localized applications on the coordinator DBMS. One major advantage of this approach is that the consistency criteria (that is, the constraints that need to be satisfied by the member databases) can be specified declaratively as part of the application. Thus any inconsistencies between the consistency criteria among member databases can be detected. So far we have elaborated on HODFA’s incremental approach towards homogenization, and examined the issues involved and features needed in supporting this approach. Such an approach allows more flexibility and in practice is more cost-effective (at least from applications’ viewpoint). From implementation viewpoint, incremental homogenization should also be more feasible and economic, and it can facilitate backup and recovery in cases of accidents and/or system failure. From application viewpoint, such an incremental approach is also quite desirable on a number of aspects, as discussed below. Migrated local applications New (distributed) applications AS1’ Coordinated Database Management System AS2’ ASn’ MD1 MD2 ... MDk Migration Process Mirroring Application Support Systems Localized Applications AS1 AS2 ASn MASS1 MASS2 ... LD1 LD2 ... MASSn ... LDn Figure 2. Migrating Localized Applications along with Database Mirroring 14 3.3 Impact on the Applications. Eventually, when all localized databases are homogenized to member databases which are managed by a single robust DBMS (i.e., the coordinator DBMS), all applications also become operational on top of the coordinated DBMS. There are two kinds of applications that can be identified here: those that correspond to existing (localized) applications, and those that are newly developed (which can be potentially distributed applications). For the former, continuous, uninterrupted operations are guaranteed by the incremental homogenization approach, in which the legacy systems are ported by MASS into homogenized member databases in a piecemeal fashion, and applications running on the localized databases can thus be migrated gradually in a similar manner. Figure 2 intuitively illustrates the processes. This feature can be particularly important to those organizations which can not afford to stop their local applications even for a small duration of time. As the localized databases within the same organization are integrated into a homogenized database federation, it also creates a great opportunity for the organization to develop a wide range of new applications (both centralized and distributed ones) on top of the federation. Let us consider some specific types of applications HODFA is targeted to support on top of the coordinated DBMS. As depicted in Figure 1, these include applications like decision support systems (DSS) that mainly access the database federation through the multilevel materialized views. (If there were no materialized views then these applications would have to generate the consolidated view for each execution of the application thus slowing down the applications tremendously). The next set of applications are the ad-hoc queries; these queries facilitate the identification of similar data objects between member databases, and they accommodate dynamic schema integration. The access plans for the ad-hoc queries can be stored in a case base and used to increase the efficiency of the ad-hoc queries. The OLTP applications cater to the day to day access and update of the member databases, which call for fast response time and updateable views. Finally we have a set of global and local member database applications that emulate (part of) the localized database applications, or new local member database applications, or new global database applications that access more than one member database. Some of these global member database applications maintain the consistency between member databases and identify inconsistencies in localized databases. Besides an extended scope and high-level new applications that can be supported, our approach can also facilitate a powerful feature called “semantic relativism” [McLeod80], meaning that different applications 15 can hold different views/interpretations on the data of the same (federated) system. For example, DSS type of applications may wish to keep semantic relativism in interpreting the data and predicting changes, and emulated local MD applications may need to exercise application autonomy during (and even after) the transition from the homogenization. In HODFA, this will be made possible by its multilevel materialized view mechanism. Further, the homogenized database federation can also maintain so-called “application autonomy” [Sheth90] existed in the previously localized legacy systems; such local autonomy can be quite important and essential to support for not only old applications but also many of the new applications. 4 Related Work There has been a good deal of research conducted on “distributed” database systems since 70’s. The bulk of these systems focused on physically distributed data but conceptually centralized representation of data, thus there was normally a single global schema maintained, with central control exercised upon the global applications (see, for example, [Rothnie77,Stonebraker77,Ceri84,Batini86]). Though the databases could be distributed into different locations with possibly different data models and even different hardware platforms used, local autonomy was not possible (or possible only to a very limited extent). These characteristics also suggest that such traditional distributed DBMSs are best suited to traditional types of applications that exhibit centralized control in top-down fashion, but they fall short to support those that are logically decentralized in nature, with a set of databases existing already that are maintained by different departments/units in a somewhat autonomous/independent manner. A more recent different type of architecture that was proposed to surmount the problems of traditional logically centralized databases is the federated database system architecture [Heimbigner85-1,Sheth90-1]. In a federated database system, there is no global schema and no central coordinating authority so that local autonomy and heterogeneity can be maintained by local databases that participate in the federation. Data sharing is normally by explicit import/export of data among local databases in a negotiated manner. This kind of systems are particularly suitable for distributed applications which are decentralized in nature, and there is a need to access data/information from databases which are connected by a network. This decentralized nature of federated database systems offers many advantages, and is able to overcome some of the most notable drawbacks of logically centralized databases. But it is difficult to exercise global control and 16 to conduct operations globally due to its strict adherence to local autonomy and heterogeneity for federated databases. Compared with these two extremes, the framework in which HODFA belongs to is somewhat in between, as is evident from the following perspectives: (1) the HODFA architecture is both centralized and decentralized at logical level, since there are legacy localized databases which work independently from each other, yet at the same time a single robust DBMS (viz., the coordinator DBMS) is used to manage all the member databases; (2) while the system still respects the local autonomy inherited from the localized databases, the coordinator DBMS is able to exercise some global control over its member databases, and new global applications can be developed on top of it; (3) there are other unique characteristics about the homogenization process which is related to, but different from the issues such as schema integration and interoperability which are central to distributed and/or federated database systems. Below we elaborate the last point in more detail. Homogenization vs. Schema Integration: Schema integration is a process of generating a global schema by resolving the conflicts between a set of distributed database schemas [Batini86]. All global applications access the data through the global schema, with the distributed database management system (DDBMS) providing the necessary support for accessing/modifying the local (heterogeneous) databases. There is no emphasis on homogenization of the disparate/heterogeneous distributed databases. Unlike the general schema integration problem, HODFA does not impose any restrictions on the applications data accesses. Even though schema integration does not entail homogenization, HODFA facilitates schema integration as all the schemas are specified using the logical data model provided by the coordinator DBMS. In case of schema integration, each of the heterogeneous local database schema may need to be transformed using a common data model before the actual integration takes place. Since the schema integration does not imply data integration, the support for features like multi-level materialized view system is generally not provided. Homogenization vs. Interoperability: Interoperability is a function of a federated database system which specifies the amount of cooperative database access that is possible between two participating database systems [IMS91, Bukhres93]. Interoperability enables an application to access data from different heterogeneous database systems. Interoperability enables homogenization of federated databases, and homogenization in HODFA implies (to a great extent) interoperability. But interoperability is not the only aspect of 17 which homogenization can be supported. Interoperability is necessary in HODFA to maintain the consistency between member and localized databases; it also enables applications to access both localized databases and member databases. Neither schema integration nor interoperability facilitate replacement of the localized databases by the member databases. This is a major advantage in homogenizing localized databases as it facilitates gradual replacement of the legacy databases. With the multi-level materialized view system being supported by the coordinator DBMS, homogenization of localized databases facilitates simpler specification, creation, and maintenance of the views in HODFA. [Brodie93] in their methodology for migrating legacy information systems investigate the general framework for incremental migration, but do not consider the concept of homogenization of heterogeneous localized databases. Since their methodology is very general, it is complicated both from architecture and methodology point of view, also their methodology cannot take advantage of the homogenized federated database system concept. Moreover, the maintenance of the consistency in their approach will be much more complicated and their approach does not support the concept of materialized multilevel views to facilitate development of new global and local applications. 5 Concluding Remarks and Future Work We have presented an architecture that facilitate dynamic evolution of localized heterogeneous database into a homogenized database federation. The approach we are taking is based on the concept of homogenization -- a process of transforming localized databases into homogeneous member databases via mirroring. An incremental methodology of homogenization is proposed in the context of our HODFA framework, which can facilitate different degrees of homogenization in a stepwise manner. Some notable characteristics and possible features of this approach are: • ability to streamline the replacement of the legacy localized databases by the member database. This is a novel feature of this approach, as the localized databases have been ported onto a coordinator DBMS and maintained as member database, we can at any time replace the localized database with the member database under the same DBMS. Also in case of failure to access either localized database or the member database, the applications will have one more database to access the data. This also provides fault tolerance, as the applications in case of failure to access one database (localized or member) can access the other database. 18 • providing a robust coordinator database management system as a front-end to the heterogeneous localized databases. This reduces the complexity of the interoperability issues between a set of heterogeneous databases. • providing the support for managing and maintaining a mirror copy of the localized database at logical and/or physical levels as member databases. The applications on the localized databases may be ported on the coordinator DBMS platform. Some of the applications can be used to maintain the consistency between the localized and member databases. This consistency could be achieved with in a time bounded delay depending on the techniques used. Due to the changes taking place in the organization, the localized databases can be dynamically added and/or deleted from the coordinator DBMS as member databases. • ability to dynamically integrate the member database schemes based on application semantics. This approach does not enforce any upfront integration of all the member databases. But will facilitate need based integration of the member database schemas by connecting conceptually similar data objects. While the incremental approach is advocated by HODFA, we must point out that ultimately it is the users and/or the database administrators who decide whether or not to adopt this methodology. Even if the incremental homogenization strategy is chosen, there may be different strategies to execute incremental homogenization. For example, they may use mixed approach (partly application based, partly database object based) or an ad-hoc approach or one that is based on the external factors (like relative importance of application and/or databases to the organization). This partly reveals the experimental nature of this research project. The HODFA system will be implemented by employing several DBMSs on a network of Sun4 workstations. Some of the implementation issues are: • development of incremental homogenization of heterogeneous databases. • maintaining consistency between localized and member databases. • dynamic schema integration to enable efficient processing of applications, and maintenance of member databases. • monitoring and management of the homogenized federated database system. 19 • defining multilevel materialized views to cater to the data requirements of decision support systems. We are currently collaborating with a telecommunication company to migrate their legacy information systems to a homogenized environment using this methodology. References [Batini86]C. Batini, M. Lenzerni and S.B. Navathe. A comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4), pp 323-364, 1986. [Brodie93]M. L. Brodie and M. Stonebraker. DARWIN: On the Incremental Migration of Legacy Information Systems. TR-0222-10-92-165, GTE Laboratories, Inc. March 1993. [Bukhres93]O.A. Bukhres, J. Chen, W. Du and A.K. Elmagarmid. InterBase: An Execution Environment for Heterogeneous Software Systems. IEEE Computer, 26(8), pp 57-69, 1993. [Ceri84]S. Ceri and G. Pellagatti. Distributed Databases: Principles and Systems, McGraw-Hill, New York, 1984. [Heimbigner85]D. Heimbigner and D. McLeod. A federated database architecture ACM Trans. on Office Information Systems, 3(3), pp 253-278, 1985. [IMS91]Y. Kambayashi, M. Rusinkiewicz and A. Sheth. Proceedings of the 1st Int’l Workshop on Interoperability in Multidatabase Database Systems, IEEE Computer Society, 1991. [McLeod81]D. McLeod and J.M. Smith. Abstractions in Databases. In Proc. Workshop on Data Abstraction, Databases and Conceptual Modeling, Pingree Park, CO, June 1980. [Ozsu91] M. T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall publishers, 1991. [Rothnie77]J.B. Rothnie and N. Goodman A survey of research and development in distributed database management. In Proceedings of 2nd VLDB Conference, 1977. [Rugaber90] S. Rugaber, S. B. Ornburn and R. J. LeBlanc Jr. Recognizing design decisions in programs, IEEE Software vol. 7, no. 1 (jan 1990) pp 46-54. [Sheth90]A. Sheth and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3), pp 183-236, 1990. [Stonebraker77]M.R. Stonebraker and E. Neuhold. A distributed database version of INGRES. In Proceedings of Berkeley Workshop on Distributed Data Management and Computer Networks, pp.19-36, 1977. 20

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download An Architecture for Homogenizing Federated Databases