Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
Automatic Data Ramon Lawrence University of Manitoba [email protected] base Integration Ken Barker University of Calgary [email protected] Summary Database integration allows systems within a company or on the WWW to interoperate. Current integration solutions require manual coding which is costly and time-consuming. Our integration architecture uses XML and a standard dictionary to automate integration. Database systems are integrated using 3 processes: capture, integration, and query. The architecture is implemented in a software package called Unity. Unity is capable of automatic database integration and querying using ODBC. Applications include data warehouse design, web site integration, and systems integration. Integration Architecture Client Client Multidatabase Layer • user’s view of integration 2) X-Spec Editor Integrated Context View X-Spec Editor Standard Dictionary Architecture Components: 1) Integrated Context View • stores schema & metadata • uses XML Integration Algorithm 3) Standard Dictionary • terms to express semantics 4) Integration Algorithm Query Processor and ODBC Manager 5) Query Processor Subtransactions X-Spec X-Spec Database Database Local Transactions • combines X-Specs into integrated context view • accepts query on view • determines data source mappings and joins • executes queries and formats results Integration Processes The integration architecture consists of three separate processes: Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec. Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view. Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted. Architecture Components The architecture consists of four components: A standard dictionary (SD) to capture data semantics SD terms are used to build semantic names describing semantics of schema elements. X-Specs for storing data semantics. relational database info. stored and transmitted using XML stores semantic names to describe schema elements Integration Algorithm Identical concepts in different databases are identified by similar semantic names. Produces an integrated view of all database concepts. Query Processor allows the user to formulate queries on the view translates from semantic names in integrated view to SQL queries and integrates and formats results involves determining correct field and table mappings and discovery of join conditions and join paths Architecture Discussion The architecture automatically integrates relational schemas into a multidatabase Desirable properties: individual mappings - information sources integrated one-at-a-time and independently integrated view constructed for query transparency user queries system by semantics instead of structure handles schema conflicts - including semantic, structural, and naming conflicts automated integration - integrated view constructed efficiently and automatically no wrapper or mediator software is required transparent querying - users issue semantic queries which are translated to SQL by the query processor no changes are performed on underlying databases Conclusions Automatic database integration is possible by using a standard term dictionary and defining semantic names for schema elements. Integration of data sources has applications to the WWW and construction of data warehouses. Users are able to transparently query integrated systems by concept instead of structure. Future Work: Test the system in large industrial projects. Improve the Unity implementation and develop an integration component for a web browser. Increase the efficiency of the query processor. Allow distributed updates and global updates on all databases. Background & Motivation Why is Integration Required? There are many integration environments: operational systems within an organization system integration during company merger data warehouses, Intranets, and the WWW Users require information from many data sources which often do not work together. Companies require a global view of their entire operations which may be present in numerous operational databases for different departments and distributed geographically. Business-to-Business (B-to-B) and Business-toConsumer (B-to-C) E-commerce demand integration of web databases with production systems for efficiency and productivity. What is Integration? Database integration requires conflict resolution during schema integration. A schema is a description of the data organization and format in a database. Schema integration combines database schemas into a unified view of the data. Constructing an integrated view is difficult because databases will store different types of data, in varying formats, with different meanings, and reference it using different names. Subsequently, the construction of the integrated view must handle the different mechanisms for storing data (structural conflicts), for referencing data (naming conflicts), and for attributing meaning to the data (semantic conflicts). Data is integrated at both the schema level (the description of the data) and the data level (individual data instances). What is the Current Solution? SQL and multidatabase query languages: querying Research do is difficult for numerous databases systems: not use standardization so cannot be automated Industrial standards: XML, BizTalk, E-commerce portals apply to limited domains/industries require standard structures and database changes Current techniques for database integration have some of these problems: require integrator to understand all databases integration process is manual do not hide system complexity from the user force changes on the existing database systems Unity Software Unity Overview Unity is a software package that implements the integration architecture with a GUI. Developed using Microsoft Visual C++ 6 and Microsoft Foundation Classes (MFC). Unity allows the user to: construct and modify standard dictionaries build X-Specs to describe data sources integrate X-Specs into an integrated view transparently query integrated systems using ODBC and automatically generate SQL transactions A screen-shot of Unity editing a standard dictionary is below. Unity is available for demonstration and distribution. References Publications: Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000. Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pages 127-136, Oct. 2000. Integrating Relational Database Schemas using a Standardized Dictionary, To appear in SAC’2001 - ACM Symposium on Applied Computing, March, 2001. Sponsors: NSERC, Further TRLabs Information: http://www.cs.umanitoba.ca/~umlawren/