Download MDBS Schema Integration: The Relational Integration Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Commitment ordering wikipedia , lookup

Database wikipedia , lookup

Relational algebra wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
MDBS Schema Integration: The
Relational Integration Model
Researchers:
Ramon Lawrence, Ken Barker
University of Manitoba
TRLabs - Winnipeg
Page 1
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Outline
 Introduction
 The
MDBS architecture and the Integration
problem
 A schema integration taxonomy
 Previous Work
 The RIM Architecture and the RIM Model
 Future work and conclusions
Page 2
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Database Terminology
 database
system - a database and a system to manage
the data
 transaction - an atomic sequence of operations applied
to the database
 global transaction - a transaction spanning more than
one database
 schema integration - the process of combining local
schemas into a global, integrated schema
 multidatabase system (MDBS) - a collection of
autonomous, local databases participating in a global
database system to share data
Page 3
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
MDBS Architecture
Global Transactions
Global Transaction Manager (GTM)
•processes global transactions
•insures information in all LDBSs is
consistent
•submits subtransactions to the
GTSs for each LDBS
GTM
subtransactions
GTS GTS
LDBS LDBS
GTS
Global Transaction Servers (GTSs)
GTS
•one for each LDBS
•converts subtransactions from the
GTM into a form usable by the
LDBS and vice versa
LDBS LDBS Local Database Systems (LDBSs)
Local Transactions
•databases combined into MDBS
•unchanged as still process local
transactions
Page 4
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
The Integration Problem
 Integrating
diverse data sources is an
important issue as organizations interconnect
their operations and demand more from their
database systems
 Integration is a hard problem because
structural and semantic conflicts exist
 Two levels of integration:
 schema
integration
 data integration
Page 5
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Schema Integration
 Schema
integration is the process of
combining database schemas into a coherent
global view
 Integration problems include:
 different
data models
 incompatible concept representations
 different user or view perspectives
 structural conflicts within a model
 naming conflicts (homonym, synonym)
Page 6
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
A Schema Integration Taxonomy
Automation Level
automatic (dynamic)
automatic (static)
semi-automatic
manual
naming
interschema
NONE
structural
structural
semantic
all
Conflicts
Resolved
behavioral
both
Transparency
Page 7
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Previous Work
 semantic
 Batini
 schema
 model
models:
(86), canonical models, SDM, DAPLEX
re-engineering:
mapping tools, schema transformations
 metadata
systems:
 rule-based
systems
 object-oriented
 use
methods:
as a canonical model, schema transformations
 application-level
 language
integration:
systems, MSQL, IDL, higher order views
Page 8
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Previous Work (cont.)
 Interdatabase
 Sheth
 AI
dependencies:
- relaxed consistency, integration rules
techniques:
 Pegasus
(spheres of knowledge), knowledge
packets, Carnot project (Cyc knowledge base)
 Lexical
semantics:
 Summary
Schemas Model (Bright et al.) - user
interface for imprecise queries
 Industrial
systems:
 Interbase
Page 9
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Objective
 The
objective of the RIM model is to provide a
system for automatically integrating diverse
relational schemas into a multidatabase
 Desirable properties:
 individual
mappings - information sources integrated
one-at-a-time and independently
 global view constructed for query transparency
 handles schema conflicts - including semantic,
structural, and naming conflicts
 automated global integration - global view
constructed efficiently and automatically
Page 10
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Idea
 The
idea behind the RIM model is that most
(and probably all) schema conflicts can be
resolved if we:
 eliminate
all naming conflicts
 define a language capable of determining schema
equivalence and performing transformations
 With
these two properties, schema conflicts
can be resolved automatically at the global
level
Page 11
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Plan
 The
first task is eliminating naming conflicts:
 use
a global thesaurus/dictionary like SSM
 map local schema names into global counterparts
 identical concepts can be identified by global name
 The
integration language must be defined:
 RIM
specifications - records capturing semantics of
each LDBS in a machine-processable form
 global names captured in RIM specs. to identify
concepts stored in LDBS
Page 12
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Plan (cont.)
 Integrate
RIM specifications:
 To
query the MDBS, the client downloads and
integrates only RIM specs. of LDBSs accessed
 Global view is constructed from RIM specs. by
automatically combining them at client site using
global names and semantic metadata they contain
 Use of global names allows system to determine
identical concepts even though structural
representations may be different
 Semantic information captured using metadata
Page 13
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Plan (cont.)
 Querying
the MDBS:
 queries
are posed to the MDBS through the global
view at each client
 translation from the GV back to the original RIM
spec. for each LDBS is performed
 the translated queries are sent to each LDBS which
transforms the query (specified using RIM) into a
query for the LDBS
 results are returned to the client which integrates
them based on its GV
Page 14
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Architecture
RDBS RDBS
RDBS RDBS
RIM
RIM
spec. spec.
RIM RIM
spec. spec.
RIM Integration
RIM Integration
Global View
Global View
Client
Client
RIM Specifications:
• constructed at each RDBS
• local concepts mapped to
global names
• schema can be automatically
extracted
RIM Integration:
• uses needed RIM specs.
• constructs global view
• resolves conflicts by:
• identifying concepts using
global names
• transforming concepts
into a form consistent with
the global view Page 15
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Using Global Names
 Global
names attempt to capture semantics of
data and its structure
 Research has found that a single dictionary
term is insufficient to capture all semantics of a
given data item
 Current proposed global name term:
 [context
term] [concept name] ([adjective phrases])
 [adjective phrase] = [adjective] [preposition]
([context term] or [concept name])
Page 16
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Using Global Names (cont.)
 Here
a few examples of using global names:
 the
database stores damage claim information
 Example
1:
 attribute
of claim is called net_amount in system
 GN: [Claim] Net Amount
 Example
2:
 attribute
of claim is called claim_date in system
 GN1: [Claim] Claim date (received by system)
 GN2: [Claim] Claim date (received by company)
 GN3: [Claim] Claim date (submitted by claimant)
Page 17
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Global Dictionary
 To
match concepts across systems, a global
dictionary is required. Global names are taken
from this dictionary.
 Dictionary currently chosen is WordNet
developed at Princeton:
 complete
on-line dictionary with a browser interface
 defines multiple definitions per term
 has built in hypernym and synonym searching and
referencing features
 Future
work involves determining how to add
locally defined terms into the dictionary if
required
Page 18
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Basic Concepts
 There
are 3 basic modeling constructs in RIM:
 entity
- a concept whose existence does not depend on
any other entities
 relationship - a combination of two or more entities
which does not exists without them
 attribute - a characteristic of an entity or a relationship
 All
entities and attributes should be identifiable
by a global name from the dictionary.
Page 19
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: RIM Specifications
A
RIM specification consists of two parts:
 table
headers - table-level information for each relation
in database
 table schemas - information at the attribute level of a
database relation
 Most
of the information can be automatically
extracted, however the DBA must assign global
names to local concepts manually
Page 20
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Table Header
 The
table header provides table-level
information for each relation and has fields:
 name
- unique table name (local)
 record size and count
 foreign key list and foreign key access list
 record insert/delete/update mechanisms
 record name - semantic name for a table record
 record type - entity, relationship instance, ...
 record grouping - why are records in the table?
 record distinction/duplicates - primary key
 table comment
Page 21
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: The Table Schema
 The
table schema contains attribute-level
information. Some fields include:
 field
name - database system name
 semantic name - global name
 field use:
 attribute, key, categorization, summation, date/time,
foreign key, logical, numeric, reference
Page 22
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
RIM: Semantic Conflicts
 There
are 6 basic semantic conflicts in RIM:
 attribute-entity
conflict
 attribute-relationship conflict
 entity-relationship conflict
 entity-entity conflict (not studied)
 attribute-attribute conflict (not studied)
 relationship-relationship conflict (not studied)
 There
is some basic ideas on how to
automatically resolve the first 3 conflicts.
 Conflict resolution is an area of future work.
Page 23
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Conclusions
 Current
integration methodologies are
insufficient because they rely on manual
intervention and do not resolve all types of
conflicts
 The RIM model may be able to integrate diverse
relational schemas using a global dictionary, a
systematic method for capturing data
semantics, and automated procedures for
performing client run-time integration
Page 24
MDBS Schema Integration: The Relational Integration Model
Ramon Lawrence
Future Work
 Determining
how the RIM specifications can be
constructed and what information can be
automatically extracted
 Deciding the format for the global dictionary
 Studying conflict resolution procedures and
testing methodology on simple integration
problems
Page 25