Download system - e-Learning

Document related concepts
no text concepts found
Transcript
Corso di Architetture della Informazione
A.A. 2009 – 2010
Carlo Batini
5.1.1 Before Data Integration
Enterprise Application Integration (EAI) &
Enterprise Information Integration (EII) &
Extract Transaction Load (ETL)/ Data Warehouse (DW)
1
Remark
• Il simbolo
significa che il tema fa
parte del programma di Rappresentazione
della Informazione.
• Alla fine della presentazione si trovano le
references, gli acronimi e i sinonimi
utilizzati.
2
Summary
•
•
•
•
Enterprise Application Integration (EAI) &
Enterprise Information Integration (EII) &
Extract Transaction Load (ETL)/ Data Warehouse &
Enterprise Information Quality (EIQ)
are different technologies that try to achieve the same
goal, the integration of SW applications, services and
information (databases, and others) managed in an
organization or in a set of organizations, that is
inherently lost by the legacy style of the application and
database developement, and by the different points of
views adopted by the different players using the
information system.
3
There is big money to be spent
on integration ……
4
Some details
5
1. Definitions and classifications
& a short hystory
6
1.1 The beginnings
7
Short history - 1
A typical Fortune I000 company has a
myriad of mission critical data systems on
which the enterprise depends. Such
systems include ERP systems, sales
tracking systems, HR systems, etc.
Over the last twenty years, the conventional
wisdom was to implement such applications
using a client-server computing model as
noted in Figure  Here, the DBMS (or other
storage technology) ran at the server level,
accessed by a collection of applications which
ran on the client desktop.
Client-server
computing was enabled by the emergence of
desktop PC's which provided a client
computing platform, and was pushed by many
software and hardware companies, that were
selling alternatives to IBM mainframes.
Client
Server
Figure 1
8
Short history - 2
Recently, several factors have rendered client
Server computing completely obsolete, but we will
focus only on the web in this discussion. Basically, it
forces every enterprise to move to the
architecture of Figure . The Client level from
previous Figure moves inside a web browser,
where at most a portion of the application from
Previous Figure can run.
Sometimes an ultra-thin client is run, whereby no
code exists at the client level. Other enterprises
utilize Java, Javascript, or other technology to
run some code on the client desktop. In either
case, the remainder of the application must be
executed in a new middleware tier.
Client
Middleware
Server
Figure 2
9
Middleware products – Application servers
Application servers have been in existence for more than
twenty-five years.
Historically, they were called TP monitors, the most
popular being the IBM Customer Information and Control
Systems (CICS).
10
Middleware products – Application servers
TP monitors came into existence to allow multiple client users to run
the same application efficiently on a mainframe computer. Specifically,
they provided application activation. To perform efficiently, they also
provided multi threading of applications (that were written to allow
threading) and connection multiplexing.
The latter feature lowered the number of connections that had to be
maintained between the middleware layer and the server level,
typically Increasing efficiency.
To provide scalability, many application
servers support automatic load sharing over multiple hardware
machines. Lastly, most provide a security module including
authentication, single site login, and access control.
In summary, an application server is capable of providing code
activation and related services.
11
Evolution from Application Servers
Application
driven
Enteprise Application Integration
(EAI)
Application
servers
Data driven
Extract Transform Load/
Data Warehouse W
Enteprise Information
Integration EII
12
1.2 Evolution toward
Enterprise Application Integration (EAI)
13
Motivation of evolution
toward EAI - example
A typical large enterprise has more than 5000 major application
systems. Such systems are always logically interconnected. For
example, when a new employee is hired, he must be inserted into the
payroll systems, added to the pension plan, added to the medical
insurance system, etc. When a customer changes his telephone
options, a change must be made to the billing system as well as to
the provisioning system.
Whenever two companies merge, there are two sets of information
systems to contend with, one from each of the partners. It is
desirable, but rarely possible, to kill one of the two overlapping
systems, at least not immediately. Again, multiple interacting
application systems are part of the resulting enterprise architecture.
14
Core functionalities of EAI middleware - 1
To service the needs of interacting application, a class of
products called enterprise application integration (EAI)
systems arose. The core functionality is to reliably
deliver a message from one application to a second one,
which needs to be alerted concerning an action taken by
the first one.
Such a messaging service is a core functionality of EAI
systems.
However, the two communicating systems, which were
written independently, never agree on the syntax and
semantics o application constructs. A transaction to the
net market is never the same as the one expected by
the buyer's procurement system or the seller's order
processing system. Hence, there is a need for message
transformation between the originator and the recipient.
15
Core functionalities of EAI middleware - 2
Lastly, the two communicating applications are never written using the
same technology. A home-brew order processing must communicate
with a net market based on Ariba. A customer relationship system
written using Siebel must communicate with an ERP system from SAP.
In this heterogeneous world, a collection of adapters is required. A
source adapter transforms the source message to a common form
while a target adapter changes the common form to that required by
the recipient.
The basic functionality provided by EAI systems is message delivery
and transformation.
16
Typical architecture of an EAI
User query/update
Middleware
Application 1
Source 1
Application 2
Source 2
Application n
Source n
17
A more recent example of
middleware: Publish/Subscribe
It realizes a many to many and anonymous interaction
The participants are divided in two groups:
– publisher: producers, message senders
– subscriber: consumers, are interested in receiving specific typology of
messages
The communication is performed through a central unit (event service) that
receives the messages from publishers and sends them to the suitable
subscribers
Useful for multiple asyncronous update of autonomous but overlapping
databases
18
Typical architecture of a P&S system
User update of object O
1
Middleware
6
Application 1
2
5
6
Source 1
Application 2
4
Source 2
3
Application n
Source n
19
From EAI to workflow processes and business
process languages and platforms
Workflow systems have also been in existence for many years.
The early systems were oriented toward procurement, and
the focus was on process flow. A typical purchase order must
be signed by the manager of the originating employee. If it is
large enough, it must also be signed by a division manager. If
it entails computer equipment, it must be signed by the CIO.
A typical large enterprise has tens or hundreds of such rules.
The process of obtaining approval for a purchase order
entails moving through a collection of approval steps, whose
sequencing is governed by business rules.
Of course, a workflow system must also have an execution
environment. This run time environment must be capable of
executing applications, as well as supporting connectivity
between one application and the subsequent one in the
process flow. Hence workflow diagrams are typically compiled
into some sort of middleware framework, and executed in the
middle tier of the Figure 2 architecture.
20
1.3 The evolution of
ETL and EII solutions
21
Problems in traditional DB architectures - 1
Lot of different types of heterogeneities among several
BS to be used together
1. Different platforms: Technological heterogeinity
2. Different data models at the participating DBMS 
Model heterog.
3. Different query languages -> Language heterogeneity
4. Different data schemas and different conceptual
representations in DBs previuosly developed at the
participating DBMS  Schema (or semantic)
heterogeneities
5. Errors in data, that result different values for the
same info  instance heteorgeneities
22
Problems in traditional DB Architectures - 2
Dependencies exist:
– among databases (databases assume the existence of
certain data in other databases),
– databases and applications (applications assume certain
properties to hold in the database),
– among applications (applications assume that other
applications are performing certain operations)
Changes in the data and applications become difficult to
manage. Since the "knowledge" on dependencies is
distributed in many places and since the same
functionality is implemented in many places each change
becomes a complex task to perform.
23
New types of transparency
DQ frameworks: DQ errors/instance heter. transparency
Mediator based: Semantics (schema) transparency
Federated databases: Model transparency
Multidatabases: Relational dialects transparency
Traditional transparencies in DBMSs:
Physical, logical, distribution
Transparency (fragmentation + allocation)
24
Nuovi tipi di trasparenza
DQ Frameworks: trasparenza dagli errori
Basate su mediatore: trasparenza dalla eterogeneita’ semantica
Basi di dati federate: trasparenza dal modello
Multibasi di dati: Trasparenza dal dialetto relaz.
Tradizionali: Trasp. Fisica, logica e
di distribuzione (frammentazione
+ allocazione)
25
The data integration problem
• Combining data coming from different
data sources, providing the user with a
unified vision of data
• Detecting correspondences between
similar concepts that come from different
sources, and solving conflicts
26
Data architecture
• Is the allocation of entities (conceptual
intensional level) and the data (extensional
level) in the multiplicity of DBMS
platforms.
27
Data architecture in a centralized DB
• The data integration
problem arises even in
the simplest situation:
unique centralized DB.
• Here, in order to avoid
inconsistencies and
duplications, and favour
efifciency, a three steps
methodology is
suggested: conceptual,
logical, phisical design
Interrogazioni
di accesso
Transazioni
di aggiornamento
DBMS
Schema logico
Data Base1
28
Architettura DBMS distribuito
• Distributed
architectures address
the efficiency issue;
in dealing with
heterogeneities, are
quite similar having a
unique schema to the
centralized case
• New: design decisions
on vertical and
horizontal
fragmentation,
allocation and
replication
Interrogazioni Transazioni
di
di accesso
aggiorname
nto
DBMS
Interrogazioni
di accesso
Interrogazioni
di accesso
Transazioni
di
aggiornamento
Transazioni
di
aggiornamento
Schema globale
DBMS
DBMS
DataBase1
Rete
Schema
locale
Schema
locale
DataBase n
Schema
locale
DataBase2
Interrogazioni
di accesso
Transazioni
di
aggiornamento
DBMS
29
In the organizations the data architecture
is built in the course of the years
2000
2007
DBMS
DBMS
Logical schema
Logical schema
2003
Data Base1
DBMS
Logical schema
Data Base3
2005
DBMS
Logical schema
Data Base4
2001
Data Base5
DBMS
Logical schema
Data Base2
30
Evolution of the data
architecture
• In the organizations the
data architecture is built
in the course of the
years.
• This leads to the
potential explosion of all
the types of
heterogeneities seen
before.
2007
DBMS
Logical schema
Data Base5
2003
DBMS
Logical schema
2005
DBMS
Logical schema
Data Base4
2001
DBMS
Logical schema
Data Base2
Data Base3
31
General approaches to data integration
• Consolidation: creation of a new centralized
database in which all databases are integrated
• Data exchange: creation of gateways between
system pairs
– Only appropriate when only two systems, no support
for queries coming from multiple systems
– Number of gateways increase potentially as n2
• Multidatabase/Federated/Data
integration/DataWarehouse solutions: creation
of a global schema
32
Consolidation
Process:
• Reverse engineering of logical schemas
into conceptual schemas
• Integration of the local conceptual
schemas into a global schema, with
resolution of schema level heterogeneities
• Integration of data, using techniques such
as record linkage to resolve
33
Virtual vs materialized integration
Besides the different approaches of data driven
integration, a distinction is made in how the
information needed is retrieved. Independently
of which approach is used to define the mapping
between the local sources and the global
schema, a
1. virtual or
2. materialized integration approach
determines how the information is accessed.
34
Virtual Data Integration
•A system designer builds a Mediated Schema over which a
user poses queries. The source databases are interfaced
with wrappers if needed. Data reside at the data sources.
Wrapper are format translators
Wise 2009 – Poznan (PL)
Università di Modena e Reggio Emilia & Milano Bicocca
35
35
Data Warehouse: Materialized
Data Integration
•
•
Data warehouse [Inmon (1997)]: “Subject-oriented, integrated, timevariant (temporal), non volatile collection of summary and detailed data,
used to support strategic decision-making process for the enterprise”
The information from the source databases is extracted,
transformed then loaded into the data warehouse. ETL = Extraction
Transformation Loading
Wise 2009 – Poznan (PL)
Università di Modena e Reggio Emilia & Milano Bicocca
36
36
Relevant Types of
Integrated Database Systems
1. Use a materialized data base (data are merged in a
new database) – Enterprise Transform Load
Systems
Data Warehouses: Materialized integrated data
sources
2. Use a virtual non materialized data base (data
remain at sources)  Enteprise Information
Integration (EII) (or Data Integration) Systems
Several solutions 
37
Virtual integration
• The virtual integration approach leaves the information
requested in the local sources. The virtual approach will
always return a fresh answer to the query. The query
posted to the global schema is reformulated into the
formats of the local information system. The
information retrieved needs to be combined to answer
the query.
38
1.4 Data Warehouses as
materialized data integration systems
39
ETL - 1
A large enterprise has more than 5000 major application
systems that hold data critical to the efficient
functioning of the enterprise. There is an obvious need
for data integration, so that a business analyst can get a
more complete picture of the enterprise and how to
optimize it.
The conventional wisdom to perform this task is to buy a
giant machine located in a safe place (say under Mount
Washington). Periodically, desired data is "scraped"
from each operational data system and copied to Mount
Washington. Of course, the various operational systems
never have a common notion of enterprise objects, such
as a purchase order.
Hence, there is a requirement to transform source data to
a common format before loading data onto the Mount
Washington machine.
40
ETL - 2
Large common machines came to be known as
warehouses, and the software to access, scrape,
transform, and load data into warehouses,
become known as extract, transform, and load
(ETL) systems. In a dynamic environment, one
must perform ETL periodically (say once a day
or once a week), thereby building up a history of
the enterprise.
The main purpose of a data warehouse is to allow
systematic or ad-hoc data mining. The
requirement for data integration has driven the
ETL market, and products are available from
Informatica, Ascential, and the major data base
vendors.
41
Data warehouses - 1
Data warehouses are used to globally analyse large
datasets that are composed from the data originating in
many On Line Transaction Processing databases (e.g. the
transaction data from all stores of a retailer all over the
country) in order to derive strategic business decisions.
Since updates play no role here, and efficient processing
of large datasets is the critical issue, the data is not
directly accessed in the component databases, but
materialized in the integrated database, the central
data warehouse.
From this warehouse typically application-specific views
are derived by materializing them in so-called
datamarts, to which then the different data analyis
tools are applied.
42
Typical architecture of a data warehouse
User query
Global schema
Source 1
Extract/Transform/Load
Local schema 1
Source 1
Extract/Translate/Load
Local schema 2
Source 2
Extract/Translate/Load
Local schema n
Source n
43
Data warehouses - 2
The canonical data model for the global schema in data
warehouses is typically a specialized form of the
relational data model, the so-called multidimensional
data model, where all data is kept in one central table
with different attributes that represent the different
dimensions. It differs insofar from the relational model
as it provides much more powerful operations on the
multidimensional data for aggregation (similarly as in an
Excel spreadsheet)
Important problems in data warehouses include the
development of wrappers, which are used to
1. map from different data models and schemas into the
datawarehouse schema,
2. perform so-called data-cleansing, i.e. remove errors
from the original data.
44
Example of multidimensional schema
Facts
SSN
Last Name
Cell Phone
Client
City of
residence
Region
Measures
Business
Business
area
Dimensions
45
Esempio di applicazione DW
Facts
CF
Cognome
Patrimonio
Client
Patrimonio e
Tenore di
vita
Dichiarazione
redditi
Dimensions
Measure
46
Data Warehouses - 3
Are a specialized form of federated database system, where the
integrated database is materialized
– perform efficient data analysis
– On-line analytical processing (OLAP)
– Data mining
Uses typically limited forms of database schemas (e.g. relational
star schemas)
Perform efficient processing of OLAP and data mining queries
47
ETL evolution
• In the past fteen years, data warehousing and
its associated ETL (Extract, Transform, Load)
technologies have delivered great business
values to enterprises looking to integrate and
analyze large volumes of disparate customer and
product data.
• The data warehouse has successfully evolved
from monthly dumps of operational data lightly
cleansed and transformed by batch programs, to
sophisticated metadata-driven systems that
move large volumes of data through staging
areas to operational data stores to data
warehouses to data marts.
48
1.5 Evolution of
Virtual Data Integration Systems
49
EII - 1
The conventional wisdom is to use data warehousing and
ETL products to perform data integration. However,
there is a serious flaw in one aspect of this wisdom.
Suppose one wants to integrate current (operational) data
rather than historical information. Consider, for
example, an e-commerce web site which wishes to sell
hotel rooms over the web. The actual inventory of
available hotel rooms exists in 100 or so information
systems. After all, Hilton, Hyatt and Marriott all run
their own reservation systems.
Applying ETL and warehousing to this problem will create a
copy of hotel availability data, which is quickly out of
date. If a web site sells a hotel room, based on this data,
it has no way of guaranteeing delivery of the room.,
because somebody else may have sold the room in the
meantime.
50
EII - 2
The only way to guarantee correct data integration in this
environment is to fetch the data at the time it is needed
from the originating system.
Data integration in dynamic environments requires
integration on demand, not integration in advance.
Dynamic data integration has created a market for data
federation systems. These products construct a
composite view of disparate data systems and allow a
user to run SQL commands, including retrieves and
updates to this composite view. Then, they translate
each SQL command on this composite view to a
collection of local SQL commands that "solves" the
user's request.
51
Historical Relevant Types
of EII Solutions
a. Multidatabases
– Integrated access to different relational DBMS
b. Federated Databases
– Integrated access to different DBMS with different logical
models
c. Mediator Systems (or Data Integration systems)
– Integrated access to different data sources (on the web)
adopting different schemas and different meanings for concepts
Mediator based: Semantics transparency
and also value heterogeneity transparency
Federated databases: Model transparency
Multidatabases: Relational dialects transparency
Traditional transparencies in DBMSs:
Physical, logical, distribution
Transparency (fragmentation + allocation)
52
1.5.1 Multidatabases
Relational dialects transparency
53
Multidatabases
• Enable transparent access to multiple
(relational) databases (relational database
middleware)
• Hide distribution, different databases language
variants
• Process queries and updates against multiple
databases (2-phase commit)
• Do not hide the different database schemas
54
Example (2004)
55
Query execution in multidatabases
When a DML statement, such as a query, is submitted, the
multiDBMS has to decompose it into queries against
the two component databases. This is a nontrivial task,
because it has to
1. determine first which tables are from which database,
2. which predicates apply to only a single database and
3. which predicates apply to tuples from both databases.
The latter ones can only be evaluated within the
multidatabase, whereas the former ones can be
evaluated within the component databases.
56
Query execution: example
• In the example this is illustrated: the queries against the
component databases do not contain the join predicate.
The join itself needs to be evaluated at the
multidatabase level. One of the main challenges in
developing multidatabases is thus to find good strategies
of how to decompose and evaluate queries against
multiple databases in an efficient manner.
57
1.5.2 Federated databases
Model transparency
58
From multiDBs to federated databases
• Federated databases go one step further than
multidatabases by transforming the databases to be
integrated both with respect to the data model as well
as with respect to the database schema.
• Thus data stored according to different data models can
be integrated and the database schema of the
integrated database can be very different from the
schemas of the component databases (which we will call
in the following also component schemas).
• In order to structure the process of mapping the data
from the component databases to the integrated
database, a 5-layer abstract schema architecture has
been introduced. It separates the different concerns in
the mapping process.
59
Federated Databases
Schema Architecture
• Export Schema
– Unifies data model
– Defines access
functions
• Import Schema
– view on export schema
• Integrated schema
– Homogenizes and unions
import schemas
• Views
– as in centralized DBMS
60
Federated Databases
Schema Architecture
Besides the 1. physical schema in a
first step differences in the data
models (relational, XML etc.) are
overcome by providing export
schemas, which are expressed in the
same (canonical) data model. Each
component database can decide
autonomously which data and which
access to the data to provide in the
export schema.
These 2. export schemas can be used
by different integrated databases.
In a second step the integrated
databases decide which data to use
from the component databases by
specifying an import schema.
The 3. import schema can of course
only use what is provided by the
export schema, thus it is a view on the
export schema.
61
Federated Databases
Schema Architecture
In a third step the integrated DBMS
maps the data obtained from
different databases as defined in the
import schemas into one common view,
the 4. integrated schema.
This can be a complex mapping, both
at the schema level and the data level.
The integrated schema is what the
integrated database provides to the
applications as logical database
schema.
62
Federated Databases
Schema Architecture
As in standard databases from the
integrated schema 5. applicationspecific views can be derived.
In practice the systems not always
follow strictly this architecture and
different layers are combined (e.g.
often there is no clear distinction
among the export and import
schemas).
Still the architecture is a very useful
tool in order to distinguish the
different aspects that need to be
addressed in database integration
63
More on Export schemas and Import schemas
Besides the 1. physical schema in a first step differences
in the data models (relational, XML etc.) are overcome
by providing export schemas, which are expressed in the
same (canonical) data model. Each component database
can decide autonomously which data and which access to
the data to provide in the export schema.
These 2. export schemas can be used by different
integrated databases.
In a second step the integrated databases decide which
data to use from the component databases by specifying
an import schema.
The 3. import schema can of course only use what is
provided by the export schema, thus it is a view on the
export schema.
64
..... on the Integrated schema
In a third step the integrated DBMS maps the data
obtained from different databases as defined in the
import schemas into one common view, the 4. integrated
schema.
This can be a complex mapping, both at the schema level
and the data level.
The integrated schema is what the integrated database
provides to the applications as logical database schema.
65
..... on Views
As in standard databases from the integrated schema
5. application-specific views can be derived.
In practice the systems not always follow strictly this
architecture and different layers are combined (e.g.
often there is no clear distinction among the export and
import schemas).
Still the architecture is a very useful tool in order to
distinguish the different aspects that need to be
addressed in database integration.
66
1.5.2.1 Details on Models
for the integrated schema
67
The common model
for the integrated schema
All these solutions request for a common model
used to represent homogeneously the global data
schema.
We can consider all of the existing kinds of data
models that have been proposed, and in fact each
of those has also been used as global model for
federated DBMS.
68
The common model in federated databases:
1. the relational model
+ Most frequently used data model for centralized
and distributed DBMS
- Difficult to represent data from more complex
models (documents, OO)
+ Thus best suited for integrating relational data
sources
- Difficult to represent constraints occuring in
integration (e.g. generalization/specialization)
69
The common model in federated databases:
2. the object oriented model
+ Expressive
+ Has been proven as successful approach and used
in research and some systems
-- Did not prevail in practice because of a lack of
commercial success of OODBMS
70
The common model in federated databases:
3. XML
+ Is becoming a ubiquitously used data model
+ XML Schema contains the main elements of an
object-oriented data model
+ Can be used both for representation of data and
documents
+ From these characterizations we can see that
most likely the XML based models will become
the prevailing global data model that is used for
integration of databases.
71
The common model in federated databases:
4. Object Oriented + XML
+ Effective when data bases (OO) and document
bases (XML) are considered together
• Adopted in Momis, the academic prototype
described later in the course.
72
1.5.2.2 An example of federated architecture
73
Federated DBMS Example
Here the
common data
model is XML
Export
schemas
74
Example - 2
Here the component DBMS use different data models
(relational and XML DTD)
The canonical data model is again different, i.e. XML Schema.
Thus the wrappers provide the necessary functionality to
access data in the component databases according to the
canonical data model.
75
Example - 3
For the relational database, for example, a complex type "Book" is
defined in the export schema that contains data from both tables
PUBLICATIONS and AUTHORS (assuming the element "authors"
contains the author names). For populating the export schema (or for
transforming the data) the wrapper would need to compute a join
using the underlying database system.
For the XML database the export schema is an XML Schema that
exactly corresponds to the XML DTD. Thus also the transformation
of the XML data is simple, because it needs not to be transformed at
all !
76
Example – 4: Import schemas
and integrated schema
77
Example - 5
Next we zoom into the internal workings of the federated DBMS. First the
federated database determines the import schemas.
As it is not using the element "pages" from DB1 the import schema drops this
element. Otherwise the schemas remain unchanged.
The two import schemas, though similar, still contain differences in the names.
Both "Book"-"book" and "authors"-"author" are incompatible names.
Therefore in order to arrive at an integrated schema the federated DBMS has
to map the two import schemas into one integrated schema as shown above
(of course in general the differences among the schemas will be much
larger).
78
Example - 6
What is not shown in the illustration is the corresponding mapping at
the data level. Since now books from two databases are mapped into
one integrated database, it can occur that the same book is
contained in both component databases. Then the federated DBMS
has to recognize such a situation and take the corresponding action,
e.g. by instantiating such a book only once in the virtual integrated
database.
79
Federated Databases Query processing
Once the integration is established at the schema
level, the federated DBMS needs to support
standard operations on the integrated
databases, in particular DML operations. Since
multiple schemas and data models are involved,
this task is considerably more complex than in
the case of multidatabases.
80
Federated Databases Query processing
Example - 2
This is illustrated in this example 
Assume a query for all authors of books is now submitted to the
integrated database in XML Query.
1. First the federated DBMS has to determine which databases can
contribute to the answer and generate from the query Q that is
posed against the integrated schema, queries that can be posed
against the import schemas (Q1 and Q2).
81
Federated Databases Query processing
Example - 3
1.bis In that step it has also to invert the mappings that
have been performed during integration (e.g. mapping
back to the names "book" and "authors").
2. Then the query is forwarded to the wrapper without
change since the import schema is just a projection on
the export schema.
82
Federated Databases Query processing
Example - 4
3. The wrapper has now the task to convert the query from a query in
the canonical query language and data model to a query in the model
of the component database. For the relational database this requires
the transformation to an SQL query, whereas for the XML database
the query is converted to an XPath query.
4. Then the wrappers submit those queries to their respective
component database systems and receive results in their respective
data models.
83
Federated Databases Query processing
Example - 5
4.Bis For the relational database this means that now the
wrapper has to convert the relational result into an XML
representation. For the XML database no change is required
since the data representation (namely XML documents) is the
same independently of whether XML DTDs or XML Schemas
are used as XML type definitions.
84
Federated Databases Query processing
Example - 6
6. Once the federated DBMS receives the results it has to
map the result data from the import schema to the
representation of the integrated schema.
7. For our example this requires a renaming of the element
name "authors" to "author" for result A1. (note that
the renaming of "book" is not required as this name
does not occur in the result).
8. Then the federated DBMS computes the result of the
query by performing a union (and thus eliminating
duplicates). Note that for more complex mappings
from the import schemas to the integrated schema
this step can be considerably more complex.
9. Then the final result can be returned to the application.
85