Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th May 2000, under the Socrates-Erasmus Programme Lectures 1. Interoperability in Information Systems 2. Introduction to Category Theory 3. Object Concepts as Categories 4. Handling Heterogeneity with Information Resource Dictionary System Lecture 1: Interoperability in Information Systems Nick Rossiter, Computing Science, Newcastle University, England [email protected] http://www.cs.ncl.ac.uk/people/b.n.rossiter/ Motivations • Diversity of modelling techniques • Distributed businesses may exercise local autonomy in platforms • Data warehousing requires heterogeneous systems to be connected • Data mining enables new rules to be derived from heterogeneous collections Basic Definitions 1 • Distribution: information bases are stored on multiple computer systems interconnected by a communication medium. • Homogeneous system: one that adheres to the same software at all sites. • Heterogeneous system: one that does not adhere to the same software at all sites. Basic Definitions 2 • Autonomy: the ability of a site to control its own activities with respect to one or more of: – – – – design communication execution association Interoperability 1 • Interoperability: the ability to request and receive services between various systems and use their functionality. • More than data exchange. • Implies a close integration. Interoperability 2 • Features: exchange of messages and requests use of each other’s functionality client-server abilities distribution operate multiple systems as single unit communication despite incompatibilities extensibility and evolution Architectures for Interoperability 1 1. Global schema integration Produces single new schema (C) for the different information systems with schemas (A, B). C A B Global Schema Integration • Advantages – Transparent to end users -- appears as single information system • Disadvantages – Difficult -- needs human understanding to perform integration – Local autonomy lost – Static - does not evolve automatically Architectures for Interoperability2 2. Federated Database Systems Less tightly coupled schema (than in 1) Each service through an export schema specifies sharable objects Common data model Internal command language Decentralised control (local autonomy) Five-level architecture for federated system Federated Databases: Looselycoupled • Created by users AE,BE are export schema AE V V is view A A,B are base schemas BE B Federated Databases: TightlyCoupled • Created by administrators • Global schema integration on all export schemas • More formal than loosely-coupled • Much effort to resolve semantic inconsistencies Federated Database Systems General Advantages • Preserves local autonomy • Not all data needs to be integrated • Provides metadata structures for views (external and export schema, data dictionary) Federated Database Systems Disadvantages by Approach • Tightly-coupled – similar to global schema integration 1) complex, difficult to make changes dynamically 2) much effort in resolving semantic inconsistencies • Loosely-coupled – duplication by different users in building views – updating data defined in views can be difficult Multidatabase Language Approach • No attempt at schema integration • Various schema in services provided can be heterogeneous, inconsistent and duplicate information in different ways. • Language (e.g. MSQL) is used to integrate databases at run time. • Relational data model used as Common Data model Multidatabase Language Approach - Diagram MSQL A,B are schema MSQL is runtime language A B Multidatabase Language Approach - Advantages • No preparatory work to understand semantics of schema • Dynamic -- access latest versions • Very skilled users can succeed in reaching their goals • Interesting work on multidatabase dependencies Example Multidatabase Language • MSQL (Multidatabase SQL) – Biased towards relational model – Illustrates problems • Consider 2 databases – Each on publications of a computing society – And query: – “What is the name, email, title for each publication of an author appearing in both of the society’s databases?” MSQL - Schema • Schema 1 (for AIIA): – Contacts (PersonID, Name, Email, …) – Conference (Name, Type, …) – Attendees(ID, Conf_ID, Speaker, …) – Publ_Papers(P_ID, Title, Author_ID, …) • Schema 2 (for IFIP): – – – – Member_Socs(Soc_Name, …) Conf (Conf_ID, …) Publ_Papers(P_Ref, Title, Conf_Ref, …) Authors(Name, Email, Paper_ID, …) Underlined attributes are primary key; attributes in italics are foreign key. MSQL for Query USE AIIA, IFIP SELECT Name, Email, Title FROM Authors, IFIP.Publ_Papers IFIP_Paper, Contacts, AIIA.Publ_papers AIIA_Paper WHERE Authors.Name = Contacts.Name AND Contacts.Person_ID = AIIA_Paper. Author_ID AND Authors.Paper_ID = IFIP_Paper.P_Ref; The USE statement declares the multidatabases which are aliased in the FROM statement to distinguish tables with the same name. Retrieves Name, Email and Title from both databases. Potential Problems with MSQL • Are domains on name comparable? • Can use LET command to create equivalencies of names but does not solve domain mismatch. • What if one schema not relational? EntityRelationship model often used as neutral schema for translation and comparison of heterogeneous features Multidatabase Language Disadvantages in General • Distribution is not transparent • Users must resolve inconsistencies themselves • Common language may restrict scope of heterogeneity (relational bias) • Local autonomous system may change schema freely (so that existing queries fail) Comparison of Approaches • By coupling: – how tightly is the interoperable system connected to its underlying systems • By adaptability: – the ability for the interoperable system to evolve in line with underlying schema • By transparency: – the need for the end-user to understand the underlying schema Comparison of Approaches Coupling Adaptability Transparency Global Schema Integration Tight Low High Federated Data Bases Medium Medium Medium Multidatbase Languages Low High Low Approach Summary Trend: • From Global Schema Integration Federated Database Multidatabase Language • of lower coupling, higher adaptability, and lower transparency. Further Reading • Management of Heterogeneous and Autonomous Database Systems Elmagarmid, Ahmed Rusinkiewicz, Marek Sheth, Amit Morgan Kaufmann 1999.