Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Structural and Semantic Heterogeneity
in Database Schema Integration
SIXTH Conference of Department of Computing
Wednesday 4 May 2005
David George
Presentation Content

Why is Integration necessary?

Evolution of Integration Approaches


Barriers to Integration - Structural and Semantic
Heterogeneity
New opportunity - Ontology and the Semantic Web
Why Database Integration?
Drivers for Data Integration

Global organisations with distributed data.

Organisations having legacy and new databases.

Organisational change e.g. business re-engineering and acquisitions.


Autonomous departments with disconnected systems requiring
interoperability e.g. Financial Services.
Business Intelligence requiring:
 decision-support systems.
 customer analysis and marketing strategies.
 data mining
Schema Integration
Local DB schema
Global schema integration
Queries
Global Schema
Schema 1
Query
Schema 2
Schema n
input
Global:Local
Schema Mapping
Query output
Evolution in Integration Approaches
Knowledge
Evolution in Integrations
Global Domain
Agreements
Digital media
Visual/Spatial/Temporal Data
[Kiosk/Geographic/Flights/Forecasting]
Focus – Semantics
Domain-specific
Information
Structured,
Semi-structured
Text repositories
Focus - Syntax of data type,
format & Schema constructs
Data
Structured DBs, Files
System
Local Task
Schemas
Focus – Systems
& Communications
Schema Integration
Common Data Models
Federated DBS
Virtual Integration
Single Ontologies
Federated IS (inc Mediators)
1985
Multiple ontologies,
Inter-ontological
Information Brokering
1995
Federated DBMS approach
External
Schema 1.1
Common
Data
Model
External
Schema 1.2
External
Schema 2.1
Federated
Schema 1
Export
Schema 1.1
Federated
Schema 2
Export
Schema 2.1
Export
Schema 2.2
Component
Schema 1
Component
Schema 2
Local
Schema 1
Local
Schema 2
Component
DBS 1
Component
DBS 2
etc
Application:
Integration of
business databases
FDBMS schema architecture
External
Schema 1.1
Common
Data
Model
External
Schema 1.2
External
Schema 2.1
Federated
Schema 1
Export
Schema 1.1
External
Schema 2.2
Federated
Schema 2
Export
Schema 2.1
Export
Schema 2.2
Export
Schema 3.1
Component
Schema 1
Component
Schema 2
Component
Schema 3
Local
Schema 1
Local
Schema 2
Local
Schema 3
Component
DBS 1
Component
DBS 2
Component
DBS 3
Mediator/Wrapper
(Virtual integration)
Network
Internet
Local Schema
Local Schema
Web
Sourc
e
O/RDB
Data
Sources
Wrapper
Wrapper
Mediated Schema
Query1
Query Translation
Query 2
Mediator
Integration
System
User Query
Application:
Integrated access to
Heterogeneous data
Information Brokering
Search Query:
“Find detached houses for sale under £300k with 2 bathrooms, 3 bedrooms,
a local school rated in the upper quartile of govt. league tables, in a district with
below-average crime rate and a socio-economically diverse population?”
Multiple Worlds
Information Mediation
Property Sales
Crime Statistics
School Rankings
Demographics
Barriers to Integration - Structural &
Semantic Heterogeneity
Recipe for Heterogeneity and Conflict
Conceptualisations of the real world are influenced by
the designers view of the Concept and Context to be modelled
Conceptualisation by
Real World
Denotation of
Representation by
Conceptual World
Database World
(representation)
Interpretation of
Schema Type Conflicts
Name
Publisher
Address
Title
Pub-Book
Book
Name
Book-Topic
Title
Code
Topics
Title
Publication
Pub-Keyword
Keyword
Code
Research Area
Publisher
Taxonomy of Schema Conflicts
Entity Definition Conflicts




Naming conflicts (Synonyms
and Homonyms)
DB Identifier conflicts e.g.
ID# vs. Name
Schema isomorphism at
attribute level (e.g. mapping
of telephone. vs. HomeTel +
WorkTel)
Missing Attributes
Domain Definition Conflicts




Naming conflicts

Data Representation (Integer
vs. String)

Data Dimensions (volume,
weight, price, number)

Dimension Measures (based
on above)

Data Scaling ( £K, £M)

Data Precision (1-100 vs. A-E)

Data Value Conflicts
Attribute Integrity Constraints
(cardinality, uniqueness, nulls)
Known Inconsistency (has
errors, presence/absence)
Temporal Inconsistency
(last update)
Acceptable Inconsistency
(within a range)
Incoherence in Cardinality
Invoice
1
Inv:Order
Invoice
1
Invoice
n
Inv:Order
Inv:Order
1
m
m
Order
Order
Order
Abstraction and Schematic Conflicts
Abstraction Level Conflicts
Schematic Discrepancies

Generalisation/Specialisation

Data Value to Attribute

Aggregation/ Decomposition

Attribute to Entity

Data Value to Entity
Generalisation/Specialisation Conflicts
Schema 1
Schema 2
Student
Student
S_Type
(ID#,Name,Type,Course)
U-graduate
(ID#,Name,Course)
Graduate
(ID#,Name,Course)
i.e. U-graduate in schema 2 represented at more general level in schema 1
Specialisation Classification Conflicts
Employee
Gender
Criteria inconsistency
Role
Person
<30
30-60
Adult
Sex
>60
<25
25-55
>55
Customer
Characterisation inconsistency
Senior
Service
Person
Degrees inconsistency
Customer
Child
Employee
Teen
Parent G-Parent
Aggregation Conflicts
Aggregation used in schema 1 is represented by a set-of entities in schema 2
Also NB: mapping exists in only one direction
Schema 1
Convoy
Schema 2
Ship
(ID#, Av_Weight, Location)
(ID#, Weight, Location, Captain)
Aggregation Conflicts (contd)
Component class
of collection
Employee(department)
vs.
Employee(division(department))
Aggregation
Specialisation
CarType(carMake, carDesign)
vs.
FamilyType(carMake, saloonSize)
Aggregation
Composition
Person(address, tel)
vs.
Person(street, city, county, tel)
Schematic Discrepancies
Data:Attribute:Entity conflicts
Stock DB1
(Date, StockCode, ClosePrice)
Value
(stockItem)
Stock DB2
(Date, StockItem1, StockItem2, …StockItemn)
(ClosePrice)
Attribute
StockItem1 DB3 (Date, ClosePrice)
StockItemn DB3 (Date, ClosePrice)
..…..
Entity
Entity
So where next?
Global
Ontology & knowledge
Domain
Enterprise
Application
Schema
Local
data
database
Information Brokering
New Solutions - Ontologies and the
Semantic Web
Ontologies in Computing

Formal vocabulary of a “universe of discourse”.

Ontologies define:
 concepts and their attributes
 relationships between concepts
 constraints on those relationships
“An Ontology is a formal, explicit specification of a shared
conceptualization”
(Gruber, 1993 & Borst, 1997)
Bibliographic Data Ontology (extract)
Biblio-Thing
Agent
Document
Person
Author
Organization
Book
Miscellaneous-Publication
Publisher
University
Proceedings
Edited-Book
Thesis
Periodical-Publication
Cartographic-Map
Doctoral-Thesis
Journal
Technical-Manual
Computer-Program
Newspaper
Magazine
Master-Thesis
http://www.ksl.stanford.edu/knowledge-sharing/ontologies/html/
Types of “ontologies”
•
DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck,
Austria, DERI – Digital Enterprise Research Institute. Available from:
http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005].
•
•
•
Value restrictions: values of properties are restricted (e.g. by a datatype).
General logic constraints: values may be constrained by using values from other properties.
First-order logic constraints: very expressive constraints between relationships such as:
disjoint classes, inverse relationships, part-whole relationships.
Semantic Web
DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online].
Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from:
http://www.deri.ie/publications/techpapers/documents/DERI-TR-2003-10-29.pdf. [Accessed 15 February 2005].
Semantic Web Tower
OWL: Clients in S1same-As
Customers in S2
OWL Ontology Language
RDFS: person X is a
LivingPerson
RDF: person X is named
“Bill".
“The Semantic Web is an extension of the current web in which information is
given well-defined meaning, better enabling computers and people to work in
cooperation”
Tim Berners-Lee et al., 2001
RDF Example
Object, Attribute, Value Triple:
Predicate, subject, object
End of Presentation
Semantic Data Model
Knowledge
I
n
t
e
r
o
p
e
r
a
b
i
l
i
t
y
Information
Data
Evolution in Interoperability
Understanding comprehensive
metadata and ontology approaches
Digital media
Visual/Spacio-Temporal Modelling
Scientific/Engineering
Key focus on: Semantics
& more domain-specific
Structured,
Semi-structured (HTML etc)
Text repositories
Global
Domain
Multi-modal sys
Understanding use of metadata
& schematic heterogeneities
Key focus on:
Syntax – data types/format
Structure – schema constructs
O-O sys
Structured DBs, Files
System
Key focus:
Systems & Comms.
Local
Schema
E-R sys
Common Data Models
Schema translation &
Integration
MDBMS / Federated DBS
Schematic & metadata
relationships, Wrappers,
Single Ontologies
Multiple ontologies,
Inter-ontological,
Metadata standards
Fed. Inf. Systems / Mediators
Mediator / Information Brokering
1985
1995
Architectures
Related documents