Download doc - itk.ilstu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Access wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Oracle Database wikipedia , lookup

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
ITK478 Special Interest Activity
Fall 2007
Implementation and Usage of a
Federated Database System
Greg Magsamen
Thursday, November 15, 2007
Table of Contents
Executive Summary __________________________________________________ 4
From Proposal: ___________________________________________________________ 4
Proposed Deliverables: ____________________________________________________ 4
Introduction __________________________________________________________ 5
Database Oriented Middleware ________________________________________ 6
What is Database Oriented Middleware? ____________________________________ 6
Database Gateway (Data Source Federation) ________________________________ 7
Federated Systems ___________________________________________________ 7
Federated Systems Issues _________________________________________________ 8
Usage ____________________________________________________________________ 8
Web Sphere Federated Server _________________________________________ 9
Setup of Federation Database ______________________________________________ 9
DB2 and Web Sphere Federated DB Product _______________________________ 10
Special Interest Activity Federated Server Setup _______________________ 11
Wrappers ________________________________________________________________ 12
User mappings ___________________________________________________________ 13
Nicknames_______________________________________________________________ 13
Special Interest Activity Setup ____________________________________________ 13
Problems ________________________________________________________________ 17
IBM DB2-C ____________________________________________________________________ 17
IBM DB2 EE (Extended Edition) __________________________________________________ 17
IBM Web Sphere Federated Server _______________________________________________ 17
Federated Object Setup _________________________________________________________ 18
Final Architecture ________________________________________________________ 18
Data Sources for the Special Interest Activity_______________________________ 18
SQL Server, Oracle, DB2 ________________________________________________________ 19
(IBM, WFS, Admin. Guide)Excel __________________________________________________ 19
Excel _________________________________________________________________________ 20
(IBM, WFS, Config. Guide)XML __________________________________________________ 20
XML __________________________________________________________________________ 21
(IBM, WFS, Config. Guide)Setting up Each Data Source _____________________ 21
Setting up Each Data Source ______________________________________________ 22
SQL Server – End Result ________________________________________________________ 22
Oracle – End Result ____________________________________________________________ 23
DB2 – LOCAL DB – End Result __________________________________________________ 23
Excel _________________________________________________________________________ 24
XML __________________________________________________________________________ 25
XML __________________________________________________________________________ 26
2
Outcome of Federated Object Usage _______________________________________ 27
Querying Tool - Command Center ________________________________________________ 27
SQL Statements - Command Center of DB2 _______________________________________ 28
Program to ITK478 _______________________________________________________ 29
Conclusions_________________________________________________________ 31
Use of Federation as Middleware __________________________________________ 31
Learning Curve __________________________________________________________ 31
DB2 Usage ____________________________________________________________________ 31
Password – User Mappings ______________________________________________________ 32
Ease of Installation _____________________________________________________________ 32
Requirements of Hardware, Software _____________________________________________ 32
Ease of Use ___________________________________________________________________ 32
Major Problems __________________________________________________________ 33
Usage Scenarios _________________________________________________________ 34
Viability found from the SIA _______________________________________________ 34
References __________________________________________________________ 35
3
Executive Summary
This special interest activity (SIA) for ITK 478 was born from the idea of accessing many
data sources from a common interface. Quickly it was determined that the technique for
doing this type of data-oriented integration was call ‘data federation’. This was a
technique for using one database (called the federated database) that actually is the
common access point for many different data source, some relational database, some
file systems and others, like web services.
The proposal originally submitted focused around learning a tool for federation and the
ability to access some common data sources. The proposal centered around getting
something up and running in the time given and learning how to add different (but
common) data sources to the available list in the federated system.
From Proposal:
The original tool proposed to use for the federation was DB2 Federated Server. This is
the tool that was used for the SIA.
The original list of suggested data sources consisted of:
Oracle in lab, SQL Express, MS Access, DB2-C
Because of system constraints, a machine separate from the ITK lab had to be used.
However, Oracle, SQL Server, DB2, and some flat files (Excel spreadsheet and XML
documents) were tried and their use successful.
Proposed Deliverables:
-
Federated Database and multiple heterogeneous databases and data sources
(Excel, XML) accessed via the federated database.
This was accomplished.
-
Sample SQL Statements via some SQL Query Tool or Reporting Tool.
Possibly a program extracting data from data source (Net Beans, VS 2005 (.Net)
The use of SQL statements (with Query Tool) was done as well as a Java program to
access the data sources. However, there were some problems with a reporting tool
(Crystal Reporting) that could not be resolved in the allotted time. If there is a resolution
to using a reporting tool before the demonstration, supplemental material will be
submitted.
The results of the topic researched, work done, activities performed and lessons learned
are the content of this Special Interest Activity paper. This paper is structured as follows:
Introduction – background on federated data bases and how database-oriented
middleware comes into play in application integration.
Federated Systems – what they are and how they work.
4
Web Sphere Federated Server – the tool chose for the SIA and how it is implemented
and works.
Special Interest Activity Federated Server Setup – how the tool was set up for this SIA
and what activities were done.
Conclusion – what problems were encountered, what lessons learned, usage
possibilities and the viability of the overall concept of federated systems.
Introduction
Perhaps the most significant challenges in corporate information management today are
the multiplicity, heterogeneity, and geographic distribution of data sources. There can be
any number of factors that lead to this data complexity within a company; for instance,
mergers and acquisitions that combine different organizations with varying IT
infrastructures, business growth from a local to a transnational company, or simply
piecemeal technologies used to build information systems over a period of time
(probably the most common) (Linthicum (7), page 176).
No matter how difficult the problem, synchronizing and managing the diverse data
sources is a key requirement to keeping a business competitive. This is where the idea
of some form of data or database oriented middleware comes into play. With all the
different data sources that an application system may need to access, some common
thread of integration becomes necessary. However, not just any API or common
interface can be used for industrial size needs. Middleware for data access does have
some high level objectives that need to be met to make it viable (Linthicum (7), page
175).
In order to seamlessly tie together multiple data sources into a usable data source
system, any potential middleware system for multiple data sources must achieve
(Linthicum (7), page 173):
Transparency: the capability to code and use applications as though the data resides in a single
database
Heterogeneity: the ability to accommodate different data requirements and sources in the
enterprise
Autonomy: the absence of restrictions being enforced at the remote data source, allowing it to
remain autonomous
High function: the ability of applications to exploit not only the high degree of function provided by
the federated system, but also the special functions unique to some of the data sources
Extensibility and openness: the flexibility to seamlessly add a new data source to the enterprise
information system
Optimized performance: the power of applications developed for the federated system to achieve
strong performance without the need to implement special strategies to evaluate the queries
5
Database Oriented Middleware
To a large extent, application integration depends upon database access. This is
particularly true for data-oriented application integration. Databases were once
proprietary and therefore difficult to access. These days so may solutions for accessing
data exist that there is rarely a problem when there is a need to retrieve information from
or place it into any database. The solutions not only make application integration a much
easier proposition, but they also speak directly to the idea that the capability of modern
middleware drives the interest in application integration (Linthicum (7), page 165).
However, even with many simplified database access solutions, database and databaseoriented middleware quickly grow complicated. Although database-oriented middleware
was once a mechanism to “get at” data, it has matured into a layer for placing data in the
context of a virtual database, today a particular, common database model or format. For
example, if there is a need to view a relational database as objects, the databaseoriented middleware can map the data so it appears as objects to a source or target
application. The same thing can be done “the other way around”; mixing and matching
such models as hierarchical, flat files, multidimensional, relational, and object-oriented
(Linthicum (7), page 176).
Database-oriented middleware also provides access to any number of databases,
regardless of the model employed or the platform upon which they exist. This access is
generally accomplished through a single common interface such as ODBC or JDBC.
Because of this, information stored in Oracle, Informix, or DB2 databases can be
accessed at the same time through a single interface. By taking advantage of these
mechanisms, there is the ability to map any difference in the source and target
databases to a common model. As a consequence, they are much easier to integrate.
These examples should make it clear that database-oriented middleware is a major
player in the world of application integration. It allows a large number of enabling
technologies to process information coming to and going to source and target systems.
This ability makes database-oriented middleware the logical choice if a message broker
or an application server requires information in a database. Vendors also recognize the
advantage of database-oriented middleware. As a result, many application integration
products, such as message brokers and application servers come pre-packaged with the
appropriate adapters to access most relational database, such as Oracle, Sybase, or
Informix. To a large degree, database access is now a “problem solved”, with many
inexpensive and proven solutions available (Linthicum (7), page 173).
What is Database Oriented Middleware?
Database-oriented middleware provides a number of important benefits, including
(Linthicum (7), page 174):
-
An interface to an application
The ability to convert the application language into something understandable
to the target database
The ability to send a query to a database over the network
The ability to process a query on the target database
6
-
The ability to move a response set back over the network to the requesting
application
The ability to convert a response set into a format understandable by the
requesting application
In addition to these processes, database-oriented middleware must provide the ability to
process many simultaneous requests, along with scaling features such as thread pooling
and load balancing. These features must be packaged with management capabilities
and security features. As in other contexts, vendor approaches to these benefits vary
greatly from vendor to vendor and technology to technology. This paper focuses on one
specific type of database-oriented middleware, the federated database system, a
successor to something called a database gateway.
Database Gateway (Data Source Federation)
Database gateways (also known as an SQL gateway) are APIs that use a single
interface to provide access to most databases residing on many different types of
platforms. They are similar to virtual database middleware products, providing
developers with access to any number of databases residing in an environment typically
difficult to access, such as a mainframe. For example, using an ODBC interface and a
database gateway, developers can access data that resides in a DB2 mainframe
database, in an Oracle database running on an AS/400 and in SQL Server database
running on a UNIX server. The developer simply makes an API call and the database
gateway does all the work (Linthicum (7), page 175).
Database gateways translate the SQL calls into a standard format known as the Format
and Protocol (FAP), the common connection between the client and the server. FAP is
also the common link between very different databases and platforms. The gateway can
translate the API call directly into FAP, moving the request to the target database, and
translating the request so that the target database and platform can react.
A number of gateways have been put on the market. These include Information Builder’s
Enterprise Data Access/SQL (EDA/SQL) and standards such as IBM’s Distributed
Relational Data Access (DRDA) which is the core basis for IBM’s Web Sphere
Federated Server of which the activities in this paper are centered on.
Federated Systems
Information management has many new challenges today that are caused by the
multiplicity, heterogeneity, and geographic distribution of important data sources.
A distributed or federated database management system is a software system that
supports the transparent creation, access, and manipulation of interrelated data located
at the different sites of a computer network. Each site of the network has autonomous
processing capability and runs local applications, which require network communication.
The main goal of a federated database system is to improve the accessibility,
compatibility, and performance of a federated database while preserving the appearance
of a centralized database management system (DBMS) (Lightstone, (6), page 369).
7
Federated Systems Issues
Because of the nature of a loosely coupled network of computers, the database design
issues encountered in federated systems differ than those encountered in centralized
database systems. In centralized databases, access efficiency is achieved through local
optimization, by using complex, physical structures. In federated databases, the global
optimization of processing, including network communication cost, is of major concern.
The total cost to run an application is a function of the network configuration: the total
user workload at that moment, the data allocation strategy, and the query optimization
algorithm (Lightstone, (6), page 369).
Federated database systems are very complex systems that have many interrelated
objectives. For example, the IBM Web Sphere Federation Server is an example of a
federated database system that tries to implement the general six objectives of a data
oriented middleware, outlined on page 5.
However, none of these objectives explicitly depends on a data allocation strategy. In
federated systems, data allocation (or data distribution) is done largely at the discretion
of the database designer or database administrator. Sometimes it involves homogenous
data sources, but usually it is heterogeneous (DB2, Oracle, SQL Server, etc.) Once the
allocation decision has been made and the replicated tool loaded in the system, the
federated system can use its optimizer to maximize the efficiency of the reallocated
database. This implies that the updating of multiple heterogeneous sites will have some
sort of transaction support, like two-phase commit (Lightstone, (6), page 370-371).
Usage
Federate systems allow application architects and developers to easily solve problems
of data access that might otherwise need expensive ‘Data Mart’ or data warehousing
solutions. In today’s business world, mergers and acquisitions create many different
platforms and storage models for the various functions of a newly created business. How
applications and users will get at all the data (new and old) becomes the focus of the
strategy for data-oriented middleware; in this case, the use of a Federated Database
System.
A good example of the usefulness of a Federated Database System is diagramed in the
following figure (page 9) from IBM’s Federated usage documents. This diagram displays
the use of many data sources for a customer / supply operation. The application(s)
access the many data source via many different sources and passes orders from one
spot to another. These orders, in the form of XML documents, are routed to the global
warehouse, while customer information is maintained in a local database table called
CUSTOMERS (Betawadkar, www.ibm.com (1)).
8
(Betawadkar, www.ibm.com (1))
Using the federation technology in IBM’s Web Sphere Federated Server, this global
warehouse is connected to two regional warehouses in the U.S. and Canada. In each
warehouse, information about items and suppliers is stored in tables ITEMS and
SUPPLIERS. In addition, the item ID and the supplier ID for each item are stored in the
table ITEM_SUPPLIED. The USA warehouse is based on a DB2 for z/OS and OS/390
system, while the Canada warehouse is in an Oracle system. Another Oracle instance,
called the Credit Checking server, tracks customers with bad credit history and is
accessible from the federated system (example from Betawadkar, www.ibm.com (1)).
Web Sphere Federated Server
Setup of Federation Database
Why DB2?
DB2 Web Sphere Federated Server was chosen for the SIA because of its availability for
download from the IBM products page, as well as its history in previous incarnations
(Data Joiner, Information Integrator). This product is known for its usage and reliability. It
is also the most talked about product in topic research, so it appeared to be the one with
the most documentation. For the SIA, this was very important in regards to the limited
amount of time for a deliverable.
There are other possible vendors that could have been used, such as Oracle’s clustering
product. However, the documentation on that product was limited and there didn’t
appear to be much usage information. In point of fact, for this SIA, there wasn’t much
documentation on whether this was an Oracle-only distributed database product or was
actually a data federation product.
Microsoft SQL Server does have some federation abilities, but for the purposes of this
SIA, the DB2 product was already in-hand so the decision was made to pursue its
capabilities.
9
DB2 and Web Sphere Federated DB Product
Web Sphere Federated Server is the latest release of IBM information integration
middleware that provides a range of integration technologies; enterprise search, data
federation, data replication, data transformation, and data event publishing, to meet
varied integration requirements (Lee, (5)).
The data-federation capabilities of Web Sphere Federated Server allow multiple
heterogeneous data sources to be accessed as if they were a single data source,
regardless of where they reside. A federated system consists of:




A DB2 instance that operates as a federated server
A database that acts as the federated database
One or more data sources
Clients (users and applications) that access the database and data sources (Lee,
(5))
With a federated system, you can send distributed requests to multiple data sources
within a single SQL statement. For example, you can join data that is located in a DB2
Universal Database table, an Oracle table, a SQL Server table and an XML-tagged file in
a single SQL statement. The following figure shows the components of a federated
system and a sample of the data sources that are accessible (Lee, (5)).
Web Sphere Federated Server Diagram
(Lee, (5))
10
A key component of the Web Sphere Federated Server architecture is the “Wrapper”.
Each type of data source will have a wrapper that acts as the interface between Web
Sphere Federated Server and the data source. As well as the rich set of wrappers
delivered with the various editions of Web Sphere Federated Server, there are also
wrapper development kits (for C++ and the Java™ programming language), which
enable new wrappers to be written to federate additional data sources. The Wrapper is
responsible for:




Determining which parts of an SQL query can be handled by the remote data
source.
Providing optimization statistics for use by Web Sphere Federated Server.
Communicating with the remote data source, sending the query and receiving the
result set.
Passing the result set to Web Sphere Federated Server and performing any data
type translation required.
In a grid environment, the wrapper has another important task at query execution time:
to transform the standards-based SQL from Web Sphere Federated Server to the SQL
dialect understood by the remote data source. This transformation compensates for
differences in syntax and function calls by substituting SQL unsupported by remote data
source. Handling query transformation at execution time allows it to cope with changes
in the underlying data source topology between planning and execution. Web Sphere
Federated Server allows read and write access to remote data sources (Lee, (5)).
Special Interest Activity Federated Server Setup
(Instructions and requirements for IBM’s setup is attached as Appendix A)
A federated system is created by first installing the federated engine, then configuring it
to communicate with the data sources. There are three basic federated objects:



The federated server communicates with the data sources by means of software
modules called wrappers.
If the data sources require authentication, the remote authentication information
can be registered with the federated system as user mappings.
Identify remote data sets that you want to access as nicknames to the federated
system. You can now reference the nickname in your application as if it were a
local table.
11
The following sections examine these federated objects one by one.
Wrappers
A wrapper module provides the logic to facilitate:




Federated object registration: A wrapper encapsulates the data source
characteristics from the federated engine. It knows what information is needed to
register each type of data source.
Communication with the data source: Communication includes establishing and
terminating connections with the data source, and maintaining the connection
across statements within an application if possible.
Services and operations: Each wrapper supports various operations, depending
on the capabilities of the data sources that the wrapper is meant to access.
These operations can include sending a query to retrieve results, updating
remote data, supporting transactions, manipulating large objects, binding input
values, and more.
Data modeling: A wrapper is responsible for mapping the data representation of
remote query results into the table format as required by the federated engine
(IBM, WFS, Admin. Guide)
In the afore mentioned example online shop scenario (figure from page 9), you would
need three wrappers: a NET8 wrapper for the two Oracle systems, a DRDA wrapper for
the DB2 for z/OS and OS/390 system, and an XML wrapper to access the online orders
from XML documents.
The diagram below shows the federated system configuration that follows this step.
Three wrappers provide access to four data sources (including the XML file that contains
the Web order). Some client libraries are required for the wrappers used here. For
example, the Oracle NET8 client software is required for the NET8 wrapper. The client
libraries are shown in the following diagram (Betawadkar, www.ibm.com (2)).
(Betawadkar, www.ibm.com (2)).
12
User mappings
To provide an additional layer of security, Web Sphere Federated Server supports user
mappings. A mapping for each Web Sphere Federated Server user to an ID and
password is needed for a remote data source. Such user mappings can be defined for
relational data sources as well as some non-relational data sources. XML files do not
require the registration of user mappings, and thus no additional layer of authentication
is applied (Betawadkar, www.ibm.com (1)).
If the user ID and password you use to connect to the federated database are the same
as those you use to access the remote data source, a user mapping is not needed.
Nicknames
Remote objects such as tables and views are registered on the federated server as
nicknames. For relational nicknames, the wrapper validates the existence of the data
source objects and retrieves the column definitions and index information, if any, when
the nickname is defined. If the statistics maintained on the data source are similar to
those on the federated system, the wrapper nickname registration function will look up
and retrieve the statistics from the remote system catalog (Betawadkar, www.ibm.com
(1)).
Accurate index information and statistics are fundamental to cost-based decisions in the
query optimizer. On a federated system, unique index information might be required to
perform an UPDATE/DELETE operation for some data sources. If data needs to be
stored locally before it is updated (for example, when the table to be updated is also
referenced on the same UPDATE statement in a predicate), the federated system uses
a unique index to simulate positioned cursors for data sources that do not support row id
(for example, DB2 family). Information about the column definitions, index, and statistics
for a nickname is stored in the DB2 federated database system catalogs (Betawadkar,
www.ibm.com (1)).
The registration of nicknames provides location transparency, since a nickname looks
just like a local DB2 table to the users on the federated server. Because relational data
sources usually provide system catalog information about the column definition of an
object, the DDL syntax only needs to identify the remote object for which a nickname is
created (Betawadkar, www.ibm.com (2)).
Special Interest Activity Setup
The installation of DB2 Web Sphere Federated Server was very cumbersome to learn. In
the end it seemed easy, but to get to that point took the majority of the SIA time. The
short rundown of steps to install begins with installed a compatible IBM DB2 database.
13
Screen shots of setup
Initially a DB2 database instance is needed (created via DB2 Control Center)
The Database Manager configuration must be changed to allow for federation. This is
done by updating the configuration to set the “Federated” parameter to “Yes”. This does
NOT disallow the use of physical DB2 databases on the same system for data storage.
14
Install WFS. If the DB2 database is already installed (which was the case in this SIA),
you can install with the existing copy of DB2 by checking the radio button.
15
However, ensure on the next panel that the PATH to the existing DB2 database
installation is selected. WFS uses the default path to install, even though the previous
panel was checked for the ‘existing copy’. If the actual path to the desired DB2 copy is
not selected in the drop down, a later panel will prompt the installer for the DB2
installation images even though it has already been installed. This “problem” caused
over 3 weeks of SIA delay.
16
Select the “wrappers” (data source access) to install with the product
And continue through the installation. System and disk space requirements are attached
in Appendix A.
Problems
IBM DB2-C
The scaled down, “free” version of the IBM DB2 for Windows database was tried, but it
never proved compatible with IBM Web Sphere Federated Server. In the end, IBM DB2
Extended Edition, which was a 90-day free trial, had to be used.
IBM DB2 EE (Extended Edition)
IBM Web Sphere Federated Server would not install initially on this version until the IBM
DB2 EE Fix pack 3 was applied. After this, IBM Web Sphere Federated Server would
install.
IBM Web Sphere Federated Server
The installation of IBM Web Sphere Federated Server seemed to complete but the
server would not start until the IBM Web Sphere Federated Server Fix pack 3
(Wrappers) was applied in conjunction with IBM DB2 EE Fix pack 3.
17
Also, IBM Web Sphere Federated Server does NOT run under the Windows Vista
platform. There actually was no mention of this lack of support, however, it took some
time to realize that lack of support and switch to a Windows XP operating system
environment.
Federated Object Setup
There is a user friendly interface in the DB2 Control Center (the interface for accessing
and administering databases). However, it turns out that it is far easier to use native SQL
commands for setting up federated objects as the GUI prompts for many variables and
parameters that may or may not be required, possibly confusing the user. Using the
native commands allows the user to only set up what they want and ignore extraneous
parameters.
Final Architecture
The final architecture used for the SIA relied on a Windows XP Professional FP 2
workstation running 2GB of memory with Pentium 4 3.00GHz CPU. The Oracle and MS
SQL Server data sources were networked on different servers, while the flat files
(meaning XML and Excel) were local to the DB2 Federated Server. The DB2 physical
database (the install SAMPLE) was local to the Federated Server.
Oracle 10g Database
`
IBM DB2 Federated Server
DB2 Physical Database
Excel Spreadsheet
XML Document
MS SQL Server 2000
Data Sources for the Special Interest Activity
The four data sources proposed for the activity were DB2, SQL Server, Oracle and some
form of flat file. Both an Excel spreadsheet file and an XML document were used for the
flat file. Trying to federate a Web Service will be attempted for the demo and if
successful, supplemental material will be submitted.
18
SQL Server, Oracle, DB2
Wrappers are used to translate the relational tables from the databases into a table
equivalent in the federate database. Specific wrappers are used for Oracle and SQL
Server, while the DB2 physical database uses its DRDA method.
DRDA Wrapper
SQL Server Wrapper
Oracle Wrapper
(IBM, WFS, Admin. Guide)
19
Excel
The ‘translation’ of the data from the spreadsheet to the federated object occurred in the
wrapper. The data then appears in the federated database as a table of columns,
corresponding to the columns of the spreadsheet. It can then be used in a SQL
statement.
(IBM, WFS, Config. Guide)
20
XML
The ‘translation’ of the data from the document to the federated object occurred in the
wrapper. The data then appears in the federated database as a table of columns,
corresponding to the element tags in the document. It can then be used in a SQL
statement.
(IBM, WFS, Config. Guide)
21
Setting up Each Data Source
To set up each relational data source, the DB2 Control Center was used, and the
Federated Object element was added to. Right clicking on the Federated Object element
begins a wizard where choices are made for different type of data source objects.
SQL Server – End Result
22
Oracle – End Result
DB2 – LOCAL DB – End Result
23
Excel
Adding the access for an Excel spreadsheet didn’t seem to work through the Control
Center. Native SQL Statements needed to be used, via the DB2 Command Line
Processor window. The following statements created the access point to the Excel
spreadsheet.
db2 => CREATE WRAPPER excel_wrapper LIBRARY 'db2lsxls.dll'
DB20000I The SQL command completed successfully.
db2 => CREATE SERVER ITK478EXCEL WRAPPER excel_wrapper
DB20000I The SQL command completed successfully.
db2 => CREATE NICKNAME AGENTGROUP (AgentGroupNum INTEGER,
AgentGroupName VARCHAR(50))
FOR SERVER ITK478EXCEL
OPTIONS (FILE_PATH 'D:\478\SIA Draft\Aspect Agent Groups.xls', RANGE 'A1:B100')
DB20000I The SQL command completed successfully.
The Wrapper is created and the two ‘table columns’ from the Federated Object:
Corresponding to the first two columns of the actual Excel spreadsheet defined in the
wrapper statement:
24
25
XML
The same Control Center problem applied when trying to create XML wrappers. It turned
out to be easier using command line code, via the DB2 Command Editor.
CREATE WRAPPER xml_wrapper LIBRARY 'libdb2lsxml.a'
CREATE SERVER ITK478XML WRAPPER xml_wrapper
CREATE NICKNAME employee (name VARCHAR(20) OPTIONS (XPATH
'@empname'), skill VARCHAR(50) OPTIONS (XPATH '@type')) FOR SERVER
ITK478XML OPTIONS (FILE_PATH 'D:\478\SIA Draft\Employee.xml', XPATH '//emp')
26
Outcome of Federated Object Usage
Querying Tool - Command Center
It is important to remember that the User Mapping comes into play when trying to access
the Federated Objects. The definitions that were used to create the object should have
mapped the ‘user’ side credentials to whatever is needed on the federated side. This has
to be done carefully or access will be denied to one of the objects and it is difficult to
debug. The Federated Database, ITK478, uses the one logon and it should be mapped
to the needed access on the remote systems.
27
SQL Statements - Command Center of DB2
The use of SQL statements is done using the nicknames give in the creation of the
federated objects. These nicknames can be the same as the remote sources or different.
For the SIA, most nicknames are the same name, but in a real environment it would
probably be wise to come up with a standard naming convention so that usage would
not be confusing to a developer or user.
28
Program to ITK478
A simple java program was written for the SIA deliverable to use the same type of
nickname-use SQL to select the values from the federated objects. This demonstrated
the use of the JDBC API support in the federated system, more notable as it used the
federated database as a ‘blind’ interface to remote data source. There is no need to
connect to each data source individually or in combination. This is probably the
greatest benefit of federated system usage, the ability to developer or use
applications accessing one common interface.
NOTE: the user mapping again becomes a concern as all federated objects need to be
prefaced with the federated ID (as shows in the SELECT statement).
Sample Code:
import java.sql.*;
public class ListCallsFromSurvey {
public static void main(String args[]) {
String url
= "JDBC:ODBC:ITK478";
Connection con;
Statement stmt;
String query = "select a.call_id, b.dial_digit, c.audience_description from
ID07781.call a, ID07781.call_today b, ID07781.audience c where a.call_id = b.dial_digit and
a.audience_id = c.audience_id";
try {
//Class.forName("oracle.jdbc.driver.OracleDriver");
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
} catch(java.lang.ClassNotFoundException e) {
System.err.print("ClassNotFoundException: ");
System.err.println(e.getMessage());
}
try {
con = DriverManager.getConnection(url,dbUser,dbPassword);
stmt = con.createStatement();
// Note the use of the executeQuery method (not executeUpdate)
// ResultSet contains the results that come back from the query
ResultSet rs = stmt.executeQuery(query);
System.out.println("CallID
System.out.println("---------
CallID on ACD
Audience");
-----------------------");
while (rs.next()) {
29
String c = rs.getString(1);
String d = rs.getString(2);
String a = rs.getString(3);
System.out.println(c + " " + d + " " + a);
}
stmt.close();
con.close();
} catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
}
}
}
Reports
{Mention ‘real world usage}
30
Conclusions
Use of Federation as Middleware
With version 7 of DB2 Universal Database for Linux, UNIX, and Windows, IBM
introduced the first commercial information management system capable of integrating
both relational and non-relational data. Federation takes data access to the next level of
information integration, allowing the DB2 federated system to act as a virtual database
where remote objects can be queried like local tables. A user of developer can leverage
the power of DB2 as well as remote data sources, and gain from the transparency,
heterogeneity, high function, autonomy, extensibility, and optimization features that such
a federated system enables (Linthicum (7), page 368).
Web Sphere Federated Server version 9.1 further enhances federation technology,
making it easier to build an enterprise federated system. Some of the important features
include (IBM, WFS, Admin. Guide):







Materialized query tables that can be defined over queries referencing both
relational and non-relational nicknames to locally cache the query results
The nickname statistics update facility
The federated health monitor
The ability to import data into a relational nickname and export data from a query
involving a nickname
An enhanced process model, which takes better advantage of parallelism for
queries involving nicknames in a federated system using the data partitioning
feature
A broader set of non-relational wrappers that includes the Web Sphere Business
Integration (WBI) and Web Services wrappers
A wrapper development kit in both C++ and Java, which allows customer
wrappers to access proprietary data sources from a federated system
Learning Curve
There were many topics that need to be covered for this SIA, starting with product
selection down to database access and file access methodology. However, this SIA was
structured to be focused on somewhat simple tasks in order to get a viable system up
and running and pursue some simple data source allocation tasks. This SIA completed
these tasks and delivered a working system. However, many more issues and
components would come into play for a ‘real-world’ system and were not addressed in
this SIA. However, many lessons were learned from beginning to end that made the
overall activity extremely beneficial
DB2 Usage
The installation of DB2 EE for Windows provided a rudimentary knowledge base for the
use of a popular (albeit expensive) DBMS and how it works, accesses, and is
administered. DB2 is a powerful DBMS, but it probably best left to industrial
implementations.
31
Password – User Mappings
The security features of the Web Sphere Federated Server system were challenging
until figured out. How to map the federated ID to the remote data source took quite a bit
of time to determine and then set up. However, once the method for setting up relational
database wrappers was done, adding subsequent wrappers became very easy.
Ease of Installation
The installation of DB2 Extended edition and Web Sphere Federated Server took the
most time of the entire SIA. The main roadblock was the lack of knowledge that
Windows Vista operating system was not supported for Web Sphere Federated Server.
The DB2 EE installation was challenging under Vista, but once done, the SIA effort tried
to focus on the second piece (Web Sphere Federated Server), which ultimately failed.
The activity then focused on Windows XP Professional which turned out to be an easier
install. Of course, the knowledge gained from installing DB2 EE made this second
attempt much easier.
Requirements of Hardware, Software
For some reason, installing DB2-C (the “free” DB2 server version) would not work on
either Windows XP or Vista, giving memory errors. The DB2 services would never start
up even though the installation requirements detailed that only 256MB of memory was
needed. Both operating systems were running under 2GB of memory, so this problem
was never figured out. This prevented installing Web Sphere Federated Server at any
time under DB2-C. The solution was to download and install DB2 Extended Edition
which is the GA version at a price. The 90-day installation installed and worked after the
fix pack 3 was applied. Web Sphere Federated Server then would install over the top of
it.
Ease of Use
The initial federated data source object was a remote SQL Server database. The
configuration of this object was done using the DB2 Control Center and many of the
parameters needed were unknown. This caused some delay in usage. However, once
some research was done, it appears a file called db2dj.ini needed some environment
variables set to access where the actual database drivers resided on the federated
server. Once these parameters were set, the relational data sources became easy to set
up.
32
Major Problems
For this SIA, there were some major problems that had to be overcome which did cause
some significant delays in reaching deliverable status.
Can’t install on Vista – approximately 3-4 week delay
Need Federated Server - Initially thought DB2 DBMS contained the needed software for
database federation. This turned out to not be true, DB2 needed to be installed, along
with IBM Web Sphere Federated Server. Approximately 2 week delay.
33
Installing on top of DB2 with existing copy – This problem was self-inflicted as trying to
install Web Sphere Federated Server with the existing DB2 EE copy continued to fail,
until it was realized that the path to the existing copy needed to be selected in the
dropdown list earlier in the installation wizard.
Usage Scenarios
IBM Web Sphere Federated Server (along with DB2 in general) is a very powerful piece
of software. Its use is probably more suited to large institutions that come into contact
with disparate data sources on a regular basis. Businesses that acquire or merge with
other businesses will need to access data quickly and this system allows for that to be
done. The other option for these scenarios is to build centralized Data Marts and data
warehouses where the disparate data sources use “contributors” to extract and fill the
centralized repository, with the needed data, for common access by users and
applications. However, this duplicates the storage needs of all DBMSs and file systems
and at the same time adds complexity with the needed systems for “contributing” the
data to the repository. With the federated approach, all data remains in their original form
and are accessed through a common interface that acts as the middleware to the remote
sources.
Viability found from the SIA
With more and more businesses adopting data federation, the challenges of enabling
seamless federation continue to grow. However, there is a broad need for many
disciplines to be learned in order to use the DB2 Federated Server. Simply knowing SQL
or software installation is far from the level of knowledge needed to efficiently use this
product or system. To fully allow its effective usage, a team effort or consultants may
need to be employed. This product is for corporate or industrial size implementations
and needs a full venue of technical knowledge. This SIA has only scratched the surface
of it usage. Network Administrators, Data Analysts, Storage Management Analysts,
Server Performance Analysts are just a few of the subject matter experts that would be
needed to implement this fully.
34
References
(1) Betawadkar, Anjali, Lin, Eileen, Ursu, Iiona. “Using data federation technology in
IBM Web Sphere Information Integrator: Data federation design and
configuration. Part 1”. June 2005.
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0506lin/index.html
(2) Betawadkar, Anjali, Lin, Eileen, Ursu, Iiona. “Using data federation technology in
IBM Web Sphere Information Integrator: Data federation design and
configuration. Part 2”. July 2005.
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0507lin/index.html
(3) IBM. Web Sphere Information Integration. Administrator’s Guide Version 9.1.
(4) IBM. Web Sphere Information Integration. Configuration Guide Version 9.1.
(5) Lee, Adrian, Magowan, James, Dantressangle. “Bridging the integration gap,
Part 1: Bridging the integration gap: Federating grid data”. August 2005
http://www.ibm.com/developerworks/grid/library/gr-feddata/index.html
(6) Lightstone, Sam, Teorey, Toby, Nadeau, Tom. Physical Database Design.
Morgan Kaufman, 2007
(7) Linthicum, David S. Next Generation Application Integration. Pearson Education,
Inc., 2004.
35