Download Database 1 Database Design

Document related concepts
no text concepts found
Transcript
TU/e
eindhoven university of technology
Technologie van
Informatiesystemen
TIS
college 3
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Inhoud
• Inleiding, 30/11
• Web engineering & Web information systems, 7/12
• Data transformatie & Data integratie, 14/12
• ERP, Smulders (Deloitte), 21/12 + 11/1
• Flower, Berens (Pallas Athena), 25/1 + 1/2
• Biztalk, van den Boom (Microsoft), 15+22/2
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Inhoud
• Inleiding, 30/11
• Web engineering & Web information systems, 7/12
Philippe Thiran
• Data transformatie & Data integratie, 14/12
• ERP, Smulders (Deloitte), 21/12 + 11/1
• Flower, Berens (Pallas Athena), 25/1 + 1/2
• Biztalk, van den Boom (Microsoft), 15+22/2
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Data Integration
Philippe Thiran
Computer Science Department
Technische Universiteit Eindhoven
The Netherlands
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation & Integration
• Agenda
– Problem Statement
• Existing database systems
• Heterogeneity, distribution, autonomy
– Data Transformation
• Schema conversion
• Query conversion: Wrapper
– Data Integration
• Schema integration
• Query processing: Multidatabase and Federation
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Problem Statement
Existing database systems
Heterogeneity, distribution, autonomy
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Problem Statement
Existing Database Systems
• Existing Database Systems
– Data are recorded in existing database systems
– Existing database systems are:
• Mission critical (essential to the organization business)
• To be operational at all times
• Inflexible
– Typically, existing database systems are:
• Very large (millions of lines of code)
• Old (often more than 10 years old)
• Written in old programming language like COBOL, PL/1,
SQL!
• Built around an old DBMS
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Problem Statement
Existing Database Systems
• Existing Database Systems
– Data are recorded in existing database
systems
– Answer of old requirements
• New functions and services
• New user requirements
• New technology (Web)
• Communication among them?
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Problem Statement
Existing Database Systems
• Existing Systems: New Services
– How to deal with existing database systems
?
• Abandon the existing systems: migration to a
new system
• Keep and modify the existing systems
• Keep the existing systems and wrap them:
autonomy
• Existing Systems: Communication
– How to integrate existing database
/faculty of mathematics and informatics
systems?
TU/e
eindhoven university of technology
Problem Statement
Data Integration
• Data Integration Problems
– Integrating database systems is very hard and
costly
Distribution
– Three main dimension
Distributed
databases
of the problem:
• Distribution
• Autonomy
• Heterogeneity
Heterogeneity
/faculty of mathematics and informatics
Autonomy
Centralized
DBMS
TU/e
eindhoven university of technology
Distribution
Data Integration
Problem Statement
• Autonomy
Autonomy
Heterogeneity
– Autonomy refers to the distribution
of control
– Four dimensions of autonomy:
• Design: own data models and own transaction
management technique
• Communication: nor knowledge of the existence of other
system nor how to communicate with them
• Execution: independently of the other systems
• Association: each system decides how much of its data
and processing capabilities it will share with the other
system
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Distribution
Data Integration
Problem Statement
•
Heterogeneity
Autonomy
Heterogeneity
– Heterogeneity may exist at three basic levels:
•
DBMS level. Data is managed by a variety of DBMS
based on different data models and data languages
–
–
•
•
Data models : relational model, hierarchical model and file
model
Data languages : SQL, DL/1, COBOL programs
Platform level. Different hardwares, different network
protocols
Semantic level. Different designer viewpoints in
modelling the same objects of the application domain.
Incompatible design specifications which lead to different
naming, types or integrity constraints
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
Local Models
Common Model
• Schema Hierarchy
Integrated
Schema
Homogenizes and unions import
schemas
Import
Schema 1
Import
Schema 2
Import
Schema 3
Export
Schema 1
Export
Schema 2
Export
Schema 3
Database
Schema 1
Database
Schema 2
Data
Schema 3
DB1
DB2
Relational DBMS
OO DBMS
File System
/faculty of mathematics and informatics
View on export schema available for
non-local access
Unifies data models
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
Local Models
Common Model
• Schema Hierarchy
Integrated
Schema
Data and Schema Integration
Import
Schema 1
Import
Schema 2
Import
Schema 3
Export
Schema 1
Export
Schema 2
Export
Schema 3
Database
Schema 1
Database
Schema 2
Data
Schema 3
DB1
DB2
Relational DBMS
OO DBMS
File System
/faculty of mathematics and informatics
Data and Schema Transformation
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
Query Conversion: Wrapper
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Introduction
– Schema conversion
– Query/Data conversion
Export
Schema 1
Database
Schema 1
Query1
Query1’
Data1
Data1’
Data
Source 1
/faculty of mathematics and informatics
Data2
Query2
Data2’ Query2’
Export
Schema 2
Database
Schema 2
Data
Source 2
Common
Data
Model
Local
Data
Models
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– Schema transformation
• Transformation of a schema expressed in a data model
(Ms) into an equivalent schema expressed in another data
model (Mt)
• Examples
– ER model  Relational model (lecture ISO)
– Relational model  XML Schema (see later)
• Schema transformation operators
• Schema conversion consists in applying the relevant
transformations on the relevant constructs of the schema
expressed in Ms in such a way that the final result
complies with Mt
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– Schema transformation
• A (schema) transformation basically is an operator by which a source
data structure C is replaced with a target structure C'.
• Example of a semantics-preserving transformation: transforming a
relationship type into an attribute
RT-FK: Transforming a
binary relationship type
into a foreign key.
B
B1
B2
id: B1
A
A1
1-1
/faculty of mathematics and informatics
R
0-N
A
A1
B1
ref: B1
B
B1
B2
id: B1
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– 2 main schema transformations for ER model 
Relational model
RT-ET: Transforming a
relationship type into
an entity type.
Inverse: ET-RT
0-N
RT-FK: Transforming
a binary relationship
type into a foreign key.
Inverse: FK-RT
B1
B1
A
A1
A
A1
R
0-N
B
B1
B2
id: B1
/faculty of mathematics and informatics
R
1-1
0-N
B1
B1
A
A1
0-N
rA
0-N
R
1-1 id: rB.B1 1-1
rA.A
A
A1
B1
ref: B1
B
B1
B2
id: B1
rB
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– Exercice: From ER model  Relational model
CUSTOMER
Code
Description
id: Code
place
0-N
ORDER
Code
id: Code
1-1
0-N
0-N
purchase
Tot
0-N
STOCK
Code
Name
Level
/faculty of mathematics and informatics
id: Code
details
Order-qty
0-N
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– Exercice: From ER model  Relational model
CUSTOMER
Code
Description
id: Code
place
0-N
ORDER
Code
id: Code
0-N
det_ORD
1-1
0-N
pur_CUS
1-1
details
Order-qty
id: det_ORD.ORDER
det_STO.STOCK
1-1
purchase
Tot
id: pur_CUS.CUSTOMER
pur_STO.STOCK
1-1
1-1
pur_STO
0-N
/faculty of mathematics and
STOCK
Code
Name
informaticsLevel
id: Code
det_STO
0-N
TU/e
eindhoven university of technology
Data Transformation
Schema Conversion
• Schema Conversion
– Exercice: From ER model  Relational model
CUSTOMER
Code
Description
id: Code
purchase
P_C_Code
Code
Tot
id: P_C_Code
Code
ref: Code
ref: P_C_Code
/faculty of mathematics and
STOCK
Code
Name
Level
informatics
id: Code
ORDER
Code
Cus_Code
id: Code
ref: Cus_Code
details
D_O_Code
Code
Order-qty
id: D_O_Code
Code
ref: Code
ref: D_O_Code
TU/e
eindhoven university of technology
Common Data Model
Common Query Language
Data Transformation
Wrappers
• Definition
Export Schema
Common
Data
Model
Wrapper
Database Schema
Local
Data
Models
Data
Source
– A wrapper controls a (legacy) data source
– Basically a wrapper is a software component that offers an
homogeneous query interface based on a common data
model (XML for the Web)
– It converts data and queries from the common data model to a
local data model
 It offers an adequate way for solving the DBMS
heterogeneity that appears when one wants to integrate
existing and heterogeneous data systems
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• Definition (ctd)
– A data wrapper is basically defined as a converter of data and
queries
– That is, a wrapper:
•
•
•
•
Offers an export schema in the common data model
Accepts queries against the export schema
Translates them into queries understandable by the data system
Transforms the results of the local queries into a format
understood by the application
Query
Data
Common Data Model
Common Query Language
Export Schema
Common
Data
Model
Wrapper
Local Data Model
Local Query Language
/faculty of mathematics and informatics
Database Schema
Data
Source
Local
Data
Models
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• Categories of Wrappers
– There exists no standard approach to build wrappers
– Functionality
• One-way: only transformation of data (e.g., for data warehouses)
• Two-way: transformation of requests and data
– Development
• Hard-wired wrappers, for specific data sources
• Semi-automated generation: wrapper development tools
• Automatically generated wrappers
– Availability
• Standalone programs (data conversion, data migration)
• Components of a federation (see later)
• Database interface for foreign data
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• Wrappers and the Web
– Wrapper interface
• Data format: XML
• Common data model: XML DTD and Schema
• Common query language: XPath, XQuery, none
– Wrapper mapping
• Generally between relational data and XML
• Two translation types
– Automated
– Defined by the user
• XML- or SQL-oriented query language
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Relational Databases
– Automated translation
Order
Id
Item
Custname Custnum
Payement
Oi
Desc
Cost
Oid
Due
Amt
10 Philips
7734
10
Ship
24000
10
1/10/01
20000
9
7725
10
Generator
8000
9
6/10/01
12000
Unilever
<db>
<order>
<row><id>10</id><custname>Philips</custname><custum>7734</custnum></row>
<row><id>9</id><custname>Unilever</custname><custum>7725</custnum></row>
</order>
<item>
<row><oid>10</oid><desc>Ship</desc><cost>24000</cost></row>
<row><oid>10</oid><desc>Generator</desc><cost>8000</cost></row>
</item>
<payement>
similar to <order> and <item>
</payement>
</db>
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Relational Databases
Order
Id
– User-defined Translation
Custname Custnum
10 Philips
7734
9
7725
Unilever
Item
Oi
Desc
Cost
10
Ship
24000
10
Generator
8000
Payement
Oid
Due
Amt
10
1/10/01
20000
6/10/01
12000
9
/faculty of mathematics and informatics
<order id=’10’>
<custname> Philips </custname>
<items>
<item description=“Ship”>
<cost> 24000 </cost>
</item>
<item description=“Generator”>
<cost> 800 <cost>
</item>
</items>
</payments>
<payement due=’1/10/01’>
<amount> 20000 </amount>
</payement>
</payements>
</order>
<order id =‘9’>
…
</order>
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Relational Databases
– Exercises
• What is the XML Document of this relational
database?
Order
OderID
Customer
Date
Total[0-1]
id: OderID
Detail
OderID
Reference
Quantity
Amount
id: OderID
Reference
ref: OderID
ref: Reference
/faculty of mathematics and informatics
Product
Reference
Label[0-1]
UnitPrice
Supplier
id: Reference
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Relational Databases
– Exercises
• What is the XML Document of this relational
<!ELEMENT Catalog (Order*, Product*)>
database?
<!ELEMENT Order (Customer, Date, Total?, Detail+)>
Order
OderID
Customer
Date
Total[0-1]
id: OderID
Detail
OderID
Reference
Quantity
Amount
id: OderID
Reference
ref: OderID
ref: Reference
Product
Reference
Label[0-1]
UnitPrice
Supplier
id: Reference
/faculty of mathematics and informatics
<!ATTLIST Order
OrderID ID #REQUIRED>
<!ELEMENT Customer ANY>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Total (#PCDATA)>
<!ELEMENT Detail (Quantity, Amount)>
<!ATTLIST Detail
Product IDREF #REQUIRED>
<!ELEMENT Quantity (#PCDATA)>
<!ELEMENT Amount (#PCDATA)>
<!ELEMENT Product (Supplier+)>
<!ATTLIST Product
Reference ID #REQUIRED
Label CDATA #IMPLIED
UnitPrice CDATA #REQUIRED>
<!ELEMENT Supplier ANY>
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Existing Relational Databases
– Mapping definition
• SQL-oriented query language
For $b in
SQL(select * from Order where Custname=“’ +$x +
‘””)
return <order> {$b/Id}
<Custname>{$x}</Custname></order>
Order
Order
Id
Custname
/faculty of mathematics and informatics
Id
Custname Custnum
10 Philips
7734
9
7725
Unilever
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Existing Relational Databases
– XML View definition
• Bottom-up (from the relational schema)
• Top-Down (from a given XML schema)
– Mappings between XML views and relational
schemas
• Automated (algorithm)
• Manual (defined by the user)
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Transformation
Wrappers
• XML Views of Existing Relational Databases
– Examples
Product Name
SQL-written
Mapping
XML-written
Mapping
XML Schema
Query over views
Xperanto
no
yes
XML Schema
yes (XQuery)
update
Microsoft’s
SQL Server
yes (FOR
XML clause)
no
XDR Schema
yes (XPath)
DB2 (IBM)
no
yes (subset
XQuery)
yes (XQuery)
no
Oracle9i
yes
no
SilkRoute
no
(AT&T)
/faculty of mathematics
and informatics
yes
no
XML Schema
yes (XQuery)
update
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
Schema Integration
Query Processing: multidatabase and
federation
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
Schema Integration
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
Local Models
Common Model
• Schema Hierarchy
Integrated
Schema
Homogenizes and unions import
schemas
Import
Schema 1
Import
Schema 2
Import
Schema 3
Export
Schema 1
Export
Schema 2
Export
Schema 3
Database
Schema 1
Database
Schema 2
Data
Schema 3
DB1
DB2
Relational DBMS
OO DBMS
File System
/faculty of mathematics and informatics
View on export schema available for
non-local access
Unifies data models
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
• Component Architecture
Application 1
Application 2
Integrated
Schema
Application 3
Common DDL/DML
Meditor
Offers an abstract integrated view of sources
Reconciles independent data structures to yield a
unique, coherent, view of the data
Import
Schema 1
Export
Schema 1
Wrapper
Wrapper
Wrapper
Local DDL/DML
Database
Schema 1
DBMS 1
DBMS 2
DBMS 3
DB1
DB2
DB3
/faculty of mathematics and informatics
Controls a local data source
Offers an homogeneous query interface
based on a common data model
TU/e
eindhoven university of technology
Data Integration
Generic Integration Architecture
• Aspects to Consider for Integration
– General Issues
• Bottom-up vs. top-down engineering
– From existing schema to integrated or vice-versa
– Schema integration vs. schema matching
• Virtual vs. materialized integration
• Read-only vs. read-write access
• Transparency
– Language, schema, location
– Data Model related issues
• Types of sources
– Structured, semi-structured, unstructured
• Common data model of integrated system
• Tight vs. loose integration
– Use of a global schema
• Query model
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Methodology
– Bottom-up process
– Four main steps
• Preparing the local schemas
• Detecting what is common between the components of
local schemas
– Correspondence (what is common)
• Solving the conflicts
– Conflict (what is incompatible)
• Integrating the different schemas according to the
correspondences and conflicts detected in the previous
steps and informatics
/faculty of mathematics
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Correspondence
– Two complementary views of correspondence:
• Structural correspondence (schema level: concepts)
• Instance correspondence (instance level: data)
– Structural correspondence
• Five types of structural correspondence:
–
–
–
–
–
Identity
Independence
Complementarity
Subtyping
Common supertype
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Correspondence
– Instance correspondence
• Four types of instance correspondence:
–
–
–
–
Disjointed: the instances classes are disjointed
Inclusion: the set of one class is included to another class
Equivalence: the classes contain the same instances
Overlapping: the classes share some instances but not all
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Conflict
– Conflicts occur in three possible ways : syntactic (naming
conflicts), structural, semantic or instance
– Syntactic conflicts (resolution: use of an ontology)
• Synonyms. Two identical objects (entities, attributes,
relationships) that have different names are synonyms
• Homonyms. Two different objects that have identical names are
homonyms
– Structural conflicts (resolution: mapping function or
transformation)
• Domain. Two identical objects have different domains
(Differences in dimension, units and scales)
• Structure. The same concept is presented by different data
structures (e.g., different attributes)
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Conflict
– Structural conflict
• In the left-hand schema, Address is an compound attribute,
whereas in the right-hand one, Address is represented by
an entity type
Site 2
• Resolution: transformation
CUSTOMER
CUSTID
NAME
id: CUSTID
/faculty of mathematics and informatics
Site 1
1-1
CUSTOMER
CUSTID
NAME
ADDRESS
STREET
ZIP CODE
CITY
id: CUSTID
lives
1-1
ADDRESS
STREET
ZIP CODE
CITY
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Conflict
– Semantic conflicts
• A semantic conflict appears when a contradiction
appears between two representations A and B of the same
application domain concept or between two integrity
constraints (resolution?)
• Example
– In the left-hand schema, Customer is identified by
CustId, whereas in the right-hand one, it is identified
CUSTOMER
by Name Site 1 CUSTOMER
Site 2
CUSTID
NAME
ADDRESS
STREET
ZIP CODE
CITY
/faculty of mathematics and informatics id: CUSTID
CUSTID
NAME
ADDRESS
STREET
ZIP CODE
CITY
id: NAME
TU/e
eindhoven university of technology
Data Integration
Schema Integration
• Concept of Conflict
– Instance conflicts
• Instance conflicts are specific to existing data
• Modelling constructs A and B that are recognized as
corresponding can cover sets with different scopes
• Examples
– ZIP codes of addresses can be written like “NL-5600 MB” or
“56oo MB” or “5600”
– Different ZIP codes can be recorded for the same address
(encoding errors)
– Resolution: Data transforming… cleaning?
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Query Processing: multidatabase and
federation
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Integration Architecture
• Three Classical Architectures
– Multidatabases
• No integrated schema
• Integrated access to different relational DBMS
– Federated Databases
• Integrated schema
• Integrated access to different DBMS
• Integrated access to different data sources (on the Web)
– Data Warehouses
• Materialized integrated data sources
• Not here
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Multidatabase
– Enable transparent access to multiple (relational) databases
• Hides distribution, different SQL variants
• Processes queries and updates against multiple databases (2phase commit)
• Does not provide any type of global schema (does not hide the
different database schemas)
DataJoiner
• Example: IBM DataJoiner
Sybase
Open Client
Oracle
SQL*Net
TCP/IP
Network
/faculty of mathematics and informatics
Sybase
Server
Oracle
Server
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Multidatabase
– Multidatabase schema
Sybase.Publications
PNR
Title
Author
Journal
id: PNR
Sybase.Authors
ANR
Title
Name
Affiliation
id: ANR
Sybase
Publications
PNR
Title
Author
Journal
id: PNR
Authors
ANR
Title
Name
Affiliation
id: ANR
Oracle.Papers
Number
Title
Writer
Published
id: Number
Oracle.Writer
FirstName
LastName
NRofPublications
id: FirstName
LastName
Multidatabase Schema
Oracle
Source 1
/faculty of mathematics and informatics
Papers
Number
Title
Writer
Published
id: Number
Writer
FirstName
LastName
NRofPublications
id: FirstName
LastName
Source 2
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Multidatabase
– Query processing
Multidatabase Schema
SELECT p2.title
FROM Sybase.PUBLICATIONS p1, Oracle.PAPERS p2
WHERE p1.title = p2.title
SELECT title
FROM PUBLICATIONS
SELECT title
FROM PAPERS
Sybase
Oracle
Source 1
Sybase
/faculty of mathematics and informatics
Data
Source 2
Oracle
Data
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Multidatabase
• Main properties
• Transparency
– Low level of transparency provided to the user
(The user is responsible for finding the relevant information,
understanding each database schema, detecting and resolving
the semantic conflicts, and finally, building the required view of
the data in the sources)
• Autonomy
– Not intrusive against the autonomy of the data sources
– Suitable when component systems are strongly autonomous
• Methodology
– Simplicity since there is no schema integration
• Maintenance and evolution
– No integrated schema maintenance
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Federation
– Integrated schema(s) and unique interface
• Hides the semantic and location heterogeneity
• Wrapper/Mediator hierarchy
– Wrapper
» Controls a local data source
» Offers an homogeneous query interface based on a common
data model
– Mediator
» Offers an abstract integrated view of several sources
» Reconciles independent data structures to yield a unique,
coherent, view of the data
– Research projects
• Tsimmis (Stanford)
• Garlic (IBM)
• Oasis (Dublin University)
/faculty of mathematics and informatics
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Federation
– Typical example
Meditor
Views
Integrated schema
Import schemas
Wrapper (provides export schema)
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“authors” type=“string”/>
<element name=“pages” type=“string”/>
</complexType>
Publication
PNR
Title
Authors
Journal
Pages
id: PNR
Authors
ANR
Title
FirstName
Surname
Affiliation
id: ANR
/faculty of mathematics and informatics
Oracle SQL DBMS
Wrapper (provides export schema)
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“author” type=“string”/>
</complexType>
<!ELEMENT Book(title,author)>
<!ELEMENT title(#PCDATA)>
<!ELEMENT author(#PCDATA)>
XML DBMS
TU/e
eindhoven university of technology
Data Integration
Query Processing
• Classical Architecture: Federation
– Typical example
Views
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“author” type=“string”/>
</complexType>
Integrated schema
<complexType name=“Book”>
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“title” type=“string”/>
<element name=“authors” type=“string”/>
<element name=“author” type=“string”/>
</complexType>
Import schema DB2
Import schema DB1 </complexType>
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“authors” type=“string”/>
<element name=“pages” type=“string”/>
/faculty of mathematics and informatics
</complexType>
<complexType name=“Book”>
<element name=“title” type=“string”/>
<element name=“author” type=“string”/>
</complexType>
TU/e
eindhoven university of technology
Data Integration
Query Processing: Federation
Submit query Q
Return result A
Q = FOR $b IN //Book
RETURN $b/author
A1’={<author> … <\author>}
A = A1’  A2
Q1 = FOR $b IN //Book
RETURN $b/authors
A1
Q2 = FOR $b IN //book
RETURN $b/author
A1=
{<authors> … <\authors>}
Q1
Q1’ = SELECT a.name
FROM AUTHORS A
Q1’
ORACLE
/faculty of mathematics
SQL DBMS and informatics
Q2
A2
A2
Q2’ = //book/author
Q2’
A2=
{<author> … <\author>}
XML
DBMS
TU/e
eindhoven university of technology
Data Integration
Query Processing
•
Classical Architecture: Federation
•
Main properties
• Transparency
–
High level of transparency provided to the user. The user is not aware of the distribution
and the heterogeneity of the integrated data sources
• Autonomy
–
Each local data source have control over its sharable information
• Methodology
–
Problems of defining an integrated schema
– Web as Loosely Coupled Federation
• Many different, widely distributed information systems
• Heterogeneity
– Structural homogeneous: XML
– Semantically heterogeneous: no explicit schemas (ontology?)
• Autonomy
–
Runtime autonomy: pages change on average every 4 weeks, dangling links
• Distribution
–
Replication (proxies) and caching frequently used
/faculty of mathematics and informatics
Related documents