Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational model wikipedia , lookup
ITK478 Special Interest Activity Fall 2007 Implementation and Usage of a Federated Database System Greg Magsamen Thursday, November 15, 2007 Table of Contents Executive Summary __________________________________________________ 4 From Proposal: ___________________________________________________________ 4 Proposed Deliverables: ____________________________________________________ 4 Introduction __________________________________________________________ 5 Database Oriented Middleware ________________________________________ 6 What is Database Oriented Middleware? ____________________________________ 6 Database Gateway (Data Source Federation) ________________________________ 7 Federated Systems ___________________________________________________ 7 Federated Systems Issues _________________________________________________ 8 Usage ____________________________________________________________________ 8 Web Sphere Federated Server _________________________________________ 9 Setup of Federation Database ______________________________________________ 9 DB2 and Web Sphere Federated DB Product _______________________________ 10 Special Interest Activity Federated Server Setup _______________________ 11 Wrappers ________________________________________________________________ 12 User mappings ___________________________________________________________ 13 Nicknames_______________________________________________________________ 13 Special Interest Activity Setup ____________________________________________ 13 Problems ________________________________________________________________ 17 IBM DB2-C ____________________________________________________________________ 17 IBM DB2 EE (Extended Edition) __________________________________________________ 17 IBM Web Sphere Federated Server _______________________________________________ 17 Federated Object Setup _________________________________________________________ 18 Final Architecture ________________________________________________________ 18 Data Sources for the Special Interest Activity_______________________________ 18 SQL Server, Oracle, DB2 ________________________________________________________ 19 (IBM, WFS, Admin. Guide)Excel __________________________________________________ 19 Excel _________________________________________________________________________ 20 (IBM, WFS, Config. Guide)XML __________________________________________________ 20 XML __________________________________________________________________________ 21 (IBM, WFS, Config. Guide)Setting up Each Data Source _____________________ 21 Setting up Each Data Source ______________________________________________ 22 SQL Server – End Result ________________________________________________________ 22 Oracle – End Result ____________________________________________________________ 23 DB2 – LOCAL DB – End Result __________________________________________________ 23 Excel _________________________________________________________________________ 24 XML __________________________________________________________________________ 25 XML __________________________________________________________________________ 26 2 Outcome of Federated Object Usage _______________________________________ 27 Querying Tool - Command Center ________________________________________________ 27 SQL Statements - Command Center of DB2 _______________________________________ 28 Program to ITK478 _______________________________________________________ 29 Conclusions_________________________________________________________ 31 Use of Federation as Middleware __________________________________________ 31 Learning Curve __________________________________________________________ 31 DB2 Usage ____________________________________________________________________ 31 Password – User Mappings ______________________________________________________ 32 Ease of Installation _____________________________________________________________ 32 Requirements of Hardware, Software _____________________________________________ 32 Ease of Use ___________________________________________________________________ 32 Major Problems __________________________________________________________ 33 Usage Scenarios _________________________________________________________ 34 Viability found from the SIA _______________________________________________ 34 References __________________________________________________________ 35 3 Executive Summary This special interest activity (SIA) for ITK 478 was born from the idea of accessing many data sources from a common interface. Quickly it was determined that the technique for doing this type of data-oriented integration was call ‘data federation’. This was a technique for using one database (called the federated database) that actually is the common access point for many different data source, some relational database, some file systems and others, like web services. The proposal originally submitted focused around learning a tool for federation and the ability to access some common data sources. The proposal centered around getting something up and running in the time given and learning how to add different (but common) data sources to the available list in the federated system. From Proposal: The original tool proposed to use for the federation was DB2 Federated Server. This is the tool that was used for the SIA. The original list of suggested data sources consisted of: Oracle in lab, SQL Express, MS Access, DB2-C Because of system constraints, a machine separate from the ITK lab had to be used. However, Oracle, SQL Server, DB2, and some flat files (Excel spreadsheet and XML documents) were tried and their use successful. Proposed Deliverables: - Federated Database and multiple heterogeneous databases and data sources (Excel, XML) accessed via the federated database. This was accomplished. - Sample SQL Statements via some SQL Query Tool or Reporting Tool. Possibly a program extracting data from data source (Net Beans, VS 2005 (.Net) The use of SQL statements (with Query Tool) was done as well as a Java program to access the data sources. However, there were some problems with a reporting tool (Crystal Reporting) that could not be resolved in the allotted time. If there is a resolution to using a reporting tool before the demonstration, supplemental material will be submitted. The results of the topic researched, work done, activities performed and lessons learned are the content of this Special Interest Activity paper. This paper is structured as follows: Introduction – background on federated data bases and how database-oriented middleware comes into play in application integration. Federated Systems – what they are and how they work. 4 Web Sphere Federated Server – the tool chose for the SIA and how it is implemented and works. Special Interest Activity Federated Server Setup – how the tool was set up for this SIA and what activities were done. Conclusion – what problems were encountered, what lessons learned, usage possibilities and the viability of the overall concept of federated systems. Introduction Perhaps the most significant challenges in corporate information management today are the multiplicity, heterogeneity, and geographic distribution of data sources. There can be any number of factors that lead to this data complexity within a company; for instance, mergers and acquisitions that combine different organizations with varying IT infrastructures, business growth from a local to a transnational company, or simply piecemeal technologies used to build information systems over a period of time (probably the most common) (Linthicum (7), page 176). No matter how difficult the problem, synchronizing and managing the diverse data sources is a key requirement to keeping a business competitive. This is where the idea of some form of data or database oriented middleware comes into play. With all the different data sources that an application system may need to access, some common thread of integration becomes necessary. However, not just any API or common interface can be used for industrial size needs. Middleware for data access does have some high level objectives that need to be met to make it viable (Linthicum (7), page 175). In order to seamlessly tie together multiple data sources into a usable data source system, any potential middleware system for multiple data sources must achieve (Linthicum (7), page 173): Transparency: the capability to code and use applications as though the data resides in a single database Heterogeneity: the ability to accommodate different data requirements and sources in the enterprise Autonomy: the absence of restrictions being enforced at the remote data source, allowing it to remain autonomous High function: the ability of applications to exploit not only the high degree of function provided by the federated system, but also the special functions unique to some of the data sources Extensibility and openness: the flexibility to seamlessly add a new data source to the enterprise information system Optimized performance: the power of applications developed for the federated system to achieve strong performance without the need to implement special strategies to evaluate the queries 5 Database Oriented Middleware To a large extent, application integration depends upon database access. This is particularly true for data-oriented application integration. Databases were once proprietary and therefore difficult to access. These days so may solutions for accessing data exist that there is rarely a problem when there is a need to retrieve information from or place it into any database. The solutions not only make application integration a much easier proposition, but they also speak directly to the idea that the capability of modern middleware drives the interest in application integration (Linthicum (7), page 165). However, even with many simplified database access solutions, database and databaseoriented middleware quickly grow complicated. Although database-oriented middleware was once a mechanism to “get at” data, it has matured into a layer for placing data in the context of a virtual database, today a particular, common database model or format. For example, if there is a need to view a relational database as objects, the databaseoriented middleware can map the data so it appears as objects to a source or target application. The same thing can be done “the other way around”; mixing and matching such models as hierarchical, flat files, multidimensional, relational, and object-oriented (Linthicum (7), page 176). Database-oriented middleware also provides access to any number of databases, regardless of the model employed or the platform upon which they exist. This access is generally accomplished through a single common interface such as ODBC or JDBC. Because of this, information stored in Oracle, Informix, or DB2 databases can be accessed at the same time through a single interface. By taking advantage of these mechanisms, there is the ability to map any difference in the source and target databases to a common model. As a consequence, they are much easier to integrate. These examples should make it clear that database-oriented middleware is a major player in the world of application integration. It allows a large number of enabling technologies to process information coming to and going to source and target systems. This ability makes database-oriented middleware the logical choice if a message broker or an application server requires information in a database. Vendors also recognize the advantage of database-oriented middleware. As a result, many application integration products, such as message brokers and application servers come pre-packaged with the appropriate adapters to access most relational database, such as Oracle, Sybase, or Informix. To a large degree, database access is now a “problem solved”, with many inexpensive and proven solutions available (Linthicum (7), page 173). What is Database Oriented Middleware? Database-oriented middleware provides a number of important benefits, including (Linthicum (7), page 174): - An interface to an application The ability to convert the application language into something understandable to the target database The ability to send a query to a database over the network The ability to process a query on the target database 6 - The ability to move a response set back over the network to the requesting application The ability to convert a response set into a format understandable by the requesting application In addition to these processes, database-oriented middleware must provide the ability to process many simultaneous requests, along with scaling features such as thread pooling and load balancing. These features must be packaged with management capabilities and security features. As in other contexts, vendor approaches to these benefits vary greatly from vendor to vendor and technology to technology. This paper focuses on one specific type of database-oriented middleware, the federated database system, a successor to something called a database gateway. Database Gateway (Data Source Federation) Database gateways (also known as an SQL gateway) are APIs that use a single interface to provide access to most databases residing on many different types of platforms. They are similar to virtual database middleware products, providing developers with access to any number of databases residing in an environment typically difficult to access, such as a mainframe. For example, using an ODBC interface and a database gateway, developers can access data that resides in a DB2 mainframe database, in an Oracle database running on an AS/400 and in SQL Server database running on a UNIX server. The developer simply makes an API call and the database gateway does all the work (Linthicum (7), page 175). Database gateways translate the SQL calls into a standard format known as the Format and Protocol (FAP), the common connection between the client and the server. FAP is also the common link between very different databases and platforms. The gateway can translate the API call directly into FAP, moving the request to the target database, and translating the request so that the target database and platform can react. A number of gateways have been put on the market. These include Information Builder’s Enterprise Data Access/SQL (EDA/SQL) and standards such as IBM’s Distributed Relational Data Access (DRDA) which is the core basis for IBM’s Web Sphere Federated Server of which the activities in this paper are centered on. Federated Systems Information management has many new challenges today that are caused by the multiplicity, heterogeneity, and geographic distribution of important data sources. A distributed or federated database management system is a software system that supports the transparent creation, access, and manipulation of interrelated data located at the different sites of a computer network. Each site of the network has autonomous processing capability and runs local applications, which require network communication. The main goal of a federated database system is to improve the accessibility, compatibility, and performance of a federated database while preserving the appearance of a centralized database management system (DBMS) (Lightstone, (6), page 369). 7 Federated Systems Issues Because of the nature of a loosely coupled network of computers, the database design issues encountered in federated systems differ than those encountered in centralized database systems. In centralized databases, access efficiency is achieved through local optimization, by using complex, physical structures. In federated databases, the global optimization of processing, including network communication cost, is of major concern. The total cost to run an application is a function of the network configuration: the total user workload at that moment, the data allocation strategy, and the query optimization algorithm (Lightstone, (6), page 369). Federated database systems are very complex systems that have many interrelated objectives. For example, the IBM Web Sphere Federation Server is an example of a federated database system that tries to implement the general six objectives of a data oriented middleware, outlined on page 5. However, none of these objectives explicitly depends on a data allocation strategy. In federated systems, data allocation (or data distribution) is done largely at the discretion of the database designer or database administrator. Sometimes it involves homogenous data sources, but usually it is heterogeneous (DB2, Oracle, SQL Server, etc.) Once the allocation decision has been made and the replicated tool loaded in the system, the federated system can use its optimizer to maximize the efficiency of the reallocated database. This implies that the updating of multiple heterogeneous sites will have some sort of transaction support, like two-phase commit (Lightstone, (6), page 370-371). Usage Federate systems allow application architects and developers to easily solve problems of data access that might otherwise need expensive ‘Data Mart’ or data warehousing solutions. In today’s business world, mergers and acquisitions create many different platforms and storage models for the various functions of a newly created business. How applications and users will get at all the data (new and old) becomes the focus of the strategy for data-oriented middleware; in this case, the use of a Federated Database System. A good example of the usefulness of a Federated Database System is diagramed in the following figure (page 9) from IBM’s Federated usage documents. This diagram displays the use of many data sources for a customer / supply operation. The application(s) access the many data source via many different sources and passes orders from one spot to another. These orders, in the form of XML documents, are routed to the global warehouse, while customer information is maintained in a local database table called CUSTOMERS (Betawadkar, www.ibm.com (1)). 8 (Betawadkar, www.ibm.com (1)) Using the federation technology in IBM’s Web Sphere Federated Server, this global warehouse is connected to two regional warehouses in the U.S. and Canada. In each warehouse, information about items and suppliers is stored in tables ITEMS and SUPPLIERS. In addition, the item ID and the supplier ID for each item are stored in the table ITEM_SUPPLIED. The USA warehouse is based on a DB2 for z/OS and OS/390 system, while the Canada warehouse is in an Oracle system. Another Oracle instance, called the Credit Checking server, tracks customers with bad credit history and is accessible from the federated system (example from Betawadkar, www.ibm.com (1)). Web Sphere Federated Server Setup of Federation Database Why DB2? DB2 Web Sphere Federated Server was chosen for the SIA because of its availability for download from the IBM products page, as well as its history in previous incarnations (Data Joiner, Information Integrator). This product is known for its usage and reliability. It is also the most talked about product in topic research, so it appeared to be the one with the most documentation. For the SIA, this was very important in regards to the limited amount of time for a deliverable. There are other possible vendors that could have been used, such as Oracle’s clustering product. However, the documentation on that product was limited and there didn’t appear to be much usage information. In point of fact, for this SIA, there wasn’t much documentation on whether this was an Oracle-only distributed database product or was actually a data federation product. Microsoft SQL Server does have some federation abilities, but for the purposes of this SIA, the DB2 product was already in-hand so the decision was made to pursue its capabilities. 9 DB2 and Web Sphere Federated DB Product Web Sphere Federated Server is the latest release of IBM information integration middleware that provides a range of integration technologies; enterprise search, data federation, data replication, data transformation, and data event publishing, to meet varied integration requirements (Lee, (5)). The data-federation capabilities of Web Sphere Federated Server allow multiple heterogeneous data sources to be accessed as if they were a single data source, regardless of where they reside. A federated system consists of: A DB2 instance that operates as a federated server A database that acts as the federated database One or more data sources Clients (users and applications) that access the database and data sources (Lee, (5)) With a federated system, you can send distributed requests to multiple data sources within a single SQL statement. For example, you can join data that is located in a DB2 Universal Database table, an Oracle table, a SQL Server table and an XML-tagged file in a single SQL statement. The following figure shows the components of a federated system and a sample of the data sources that are accessible (Lee, (5)). Web Sphere Federated Server Diagram (Lee, (5)) 10 A key component of the Web Sphere Federated Server architecture is the “Wrapper”. Each type of data source will have a wrapper that acts as the interface between Web Sphere Federated Server and the data source. As well as the rich set of wrappers delivered with the various editions of Web Sphere Federated Server, there are also wrapper development kits (for C++ and the Java™ programming language), which enable new wrappers to be written to federate additional data sources. The Wrapper is responsible for: Determining which parts of an SQL query can be handled by the remote data source. Providing optimization statistics for use by Web Sphere Federated Server. Communicating with the remote data source, sending the query and receiving the result set. Passing the result set to Web Sphere Federated Server and performing any data type translation required. In a grid environment, the wrapper has another important task at query execution time: to transform the standards-based SQL from Web Sphere Federated Server to the SQL dialect understood by the remote data source. This transformation compensates for differences in syntax and function calls by substituting SQL unsupported by remote data source. Handling query transformation at execution time allows it to cope with changes in the underlying data source topology between planning and execution. Web Sphere Federated Server allows read and write access to remote data sources (Lee, (5)). Special Interest Activity Federated Server Setup (Instructions and requirements for IBM’s setup is attached as Appendix A) A federated system is created by first installing the federated engine, then configuring it to communicate with the data sources. There are three basic federated objects: The federated server communicates with the data sources by means of software modules called wrappers. If the data sources require authentication, the remote authentication information can be registered with the federated system as user mappings. Identify remote data sets that you want to access as nicknames to the federated system. You can now reference the nickname in your application as if it were a local table. 11 The following sections examine these federated objects one by one. Wrappers A wrapper module provides the logic to facilitate: Federated object registration: A wrapper encapsulates the data source characteristics from the federated engine. It knows what information is needed to register each type of data source. Communication with the data source: Communication includes establishing and terminating connections with the data source, and maintaining the connection across statements within an application if possible. Services and operations: Each wrapper supports various operations, depending on the capabilities of the data sources that the wrapper is meant to access. These operations can include sending a query to retrieve results, updating remote data, supporting transactions, manipulating large objects, binding input values, and more. Data modeling: A wrapper is responsible for mapping the data representation of remote query results into the table format as required by the federated engine (IBM, WFS, Admin. Guide) In the afore mentioned example online shop scenario (figure from page 9), you would need three wrappers: a NET8 wrapper for the two Oracle systems, a DRDA wrapper for the DB2 for z/OS and OS/390 system, and an XML wrapper to access the online orders from XML documents. The diagram below shows the federated system configuration that follows this step. Three wrappers provide access to four data sources (including the XML file that contains the Web order). Some client libraries are required for the wrappers used here. For example, the Oracle NET8 client software is required for the NET8 wrapper. The client libraries are shown in the following diagram (Betawadkar, www.ibm.com (2)). (Betawadkar, www.ibm.com (2)). 12 User mappings To provide an additional layer of security, Web Sphere Federated Server supports user mappings. A mapping for each Web Sphere Federated Server user to an ID and password is needed for a remote data source. Such user mappings can be defined for relational data sources as well as some non-relational data sources. XML files do not require the registration of user mappings, and thus no additional layer of authentication is applied (Betawadkar, www.ibm.com (1)). If the user ID and password you use to connect to the federated database are the same as those you use to access the remote data source, a user mapping is not needed. Nicknames Remote objects such as tables and views are registered on the federated server as nicknames. For relational nicknames, the wrapper validates the existence of the data source objects and retrieves the column definitions and index information, if any, when the nickname is defined. If the statistics maintained on the data source are similar to those on the federated system, the wrapper nickname registration function will look up and retrieve the statistics from the remote system catalog (Betawadkar, www.ibm.com (1)). Accurate index information and statistics are fundamental to cost-based decisions in the query optimizer. On a federated system, unique index information might be required to perform an UPDATE/DELETE operation for some data sources. If data needs to be stored locally before it is updated (for example, when the table to be updated is also referenced on the same UPDATE statement in a predicate), the federated system uses a unique index to simulate positioned cursors for data sources that do not support row id (for example, DB2 family). Information about the column definitions, index, and statistics for a nickname is stored in the DB2 federated database system catalogs (Betawadkar, www.ibm.com (1)). The registration of nicknames provides location transparency, since a nickname looks just like a local DB2 table to the users on the federated server. Because relational data sources usually provide system catalog information about the column definition of an object, the DDL syntax only needs to identify the remote object for which a nickname is created (Betawadkar, www.ibm.com (2)). Special Interest Activity Setup The installation of DB2 Web Sphere Federated Server was very cumbersome to learn. In the end it seemed easy, but to get to that point took the majority of the SIA time. The short rundown of steps to install begins with installed a compatible IBM DB2 database. 13 Screen shots of setup Initially a DB2 database instance is needed (created via DB2 Control Center) The Database Manager configuration must be changed to allow for federation. This is done by updating the configuration to set the “Federated” parameter to “Yes”. This does NOT disallow the use of physical DB2 databases on the same system for data storage. 14 Install WFS. If the DB2 database is already installed (which was the case in this SIA), you can install with the existing copy of DB2 by checking the radio button. 15 However, ensure on the next panel that the PATH to the existing DB2 database installation is selected. WFS uses the default path to install, even though the previous panel was checked for the ‘existing copy’. If the actual path to the desired DB2 copy is not selected in the drop down, a later panel will prompt the installer for the DB2 installation images even though it has already been installed. This “problem” caused over 3 weeks of SIA delay. 16 Select the “wrappers” (data source access) to install with the product And continue through the installation. System and disk space requirements are attached in Appendix A. Problems IBM DB2-C The scaled down, “free” version of the IBM DB2 for Windows database was tried, but it never proved compatible with IBM Web Sphere Federated Server. In the end, IBM DB2 Extended Edition, which was a 90-day free trial, had to be used. IBM DB2 EE (Extended Edition) IBM Web Sphere Federated Server would not install initially on this version until the IBM DB2 EE Fix pack 3 was applied. After this, IBM Web Sphere Federated Server would install. IBM Web Sphere Federated Server The installation of IBM Web Sphere Federated Server seemed to complete but the server would not start until the IBM Web Sphere Federated Server Fix pack 3 (Wrappers) was applied in conjunction with IBM DB2 EE Fix pack 3. 17 Also, IBM Web Sphere Federated Server does NOT run under the Windows Vista platform. There actually was no mention of this lack of support, however, it took some time to realize that lack of support and switch to a Windows XP operating system environment. Federated Object Setup There is a user friendly interface in the DB2 Control Center (the interface for accessing and administering databases). However, it turns out that it is far easier to use native SQL commands for setting up federated objects as the GUI prompts for many variables and parameters that may or may not be required, possibly confusing the user. Using the native commands allows the user to only set up what they want and ignore extraneous parameters. Final Architecture The final architecture used for the SIA relied on a Windows XP Professional FP 2 workstation running 2GB of memory with Pentium 4 3.00GHz CPU. The Oracle and MS SQL Server data sources were networked on different servers, while the flat files (meaning XML and Excel) were local to the DB2 Federated Server. The DB2 physical database (the install SAMPLE) was local to the Federated Server. Oracle 10g Database ` IBM DB2 Federated Server DB2 Physical Database Excel Spreadsheet XML Document MS SQL Server 2000 Data Sources for the Special Interest Activity The four data sources proposed for the activity were DB2, SQL Server, Oracle and some form of flat file. Both an Excel spreadsheet file and an XML document were used for the flat file. Trying to federate a Web Service will be attempted for the demo and if successful, supplemental material will be submitted. 18 SQL Server, Oracle, DB2 Wrappers are used to translate the relational tables from the databases into a table equivalent in the federate database. Specific wrappers are used for Oracle and SQL Server, while the DB2 physical database uses its DRDA method. DRDA Wrapper SQL Server Wrapper Oracle Wrapper (IBM, WFS, Admin. Guide) 19 Excel The ‘translation’ of the data from the spreadsheet to the federated object occurred in the wrapper. The data then appears in the federated database as a table of columns, corresponding to the columns of the spreadsheet. It can then be used in a SQL statement. (IBM, WFS, Config. Guide) 20 XML The ‘translation’ of the data from the document to the federated object occurred in the wrapper. The data then appears in the federated database as a table of columns, corresponding to the element tags in the document. It can then be used in a SQL statement. (IBM, WFS, Config. Guide) 21 Setting up Each Data Source To set up each relational data source, the DB2 Control Center was used, and the Federated Object element was added to. Right clicking on the Federated Object element begins a wizard where choices are made for different type of data source objects. SQL Server – End Result 22 Oracle – End Result DB2 – LOCAL DB – End Result 23 Excel Adding the access for an Excel spreadsheet didn’t seem to work through the Control Center. Native SQL Statements needed to be used, via the DB2 Command Line Processor window. The following statements created the access point to the Excel spreadsheet. db2 => CREATE WRAPPER excel_wrapper LIBRARY 'db2lsxls.dll' DB20000I The SQL command completed successfully. db2 => CREATE SERVER ITK478EXCEL WRAPPER excel_wrapper DB20000I The SQL command completed successfully. db2 => CREATE NICKNAME AGENTGROUP (AgentGroupNum INTEGER, AgentGroupName VARCHAR(50)) FOR SERVER ITK478EXCEL OPTIONS (FILE_PATH 'D:\478\SIA Draft\Aspect Agent Groups.xls', RANGE 'A1:B100') DB20000I The SQL command completed successfully. The Wrapper is created and the two ‘table columns’ from the Federated Object: Corresponding to the first two columns of the actual Excel spreadsheet defined in the wrapper statement: 24 25 XML The same Control Center problem applied when trying to create XML wrappers. It turned out to be easier using command line code, via the DB2 Command Editor. CREATE WRAPPER xml_wrapper LIBRARY 'libdb2lsxml.a' CREATE SERVER ITK478XML WRAPPER xml_wrapper CREATE NICKNAME employee (name VARCHAR(20) OPTIONS (XPATH '@empname'), skill VARCHAR(50) OPTIONS (XPATH '@type')) FOR SERVER ITK478XML OPTIONS (FILE_PATH 'D:\478\SIA Draft\Employee.xml', XPATH '//emp') 26 Outcome of Federated Object Usage Querying Tool - Command Center It is important to remember that the User Mapping comes into play when trying to access the Federated Objects. The definitions that were used to create the object should have mapped the ‘user’ side credentials to whatever is needed on the federated side. This has to be done carefully or access will be denied to one of the objects and it is difficult to debug. The Federated Database, ITK478, uses the one logon and it should be mapped to the needed access on the remote systems. 27 SQL Statements - Command Center of DB2 The use of SQL statements is done using the nicknames give in the creation of the federated objects. These nicknames can be the same as the remote sources or different. For the SIA, most nicknames are the same name, but in a real environment it would probably be wise to come up with a standard naming convention so that usage would not be confusing to a developer or user. 28 Program to ITK478 A simple java program was written for the SIA deliverable to use the same type of nickname-use SQL to select the values from the federated objects. This demonstrated the use of the JDBC API support in the federated system, more notable as it used the federated database as a ‘blind’ interface to remote data source. There is no need to connect to each data source individually or in combination. This is probably the greatest benefit of federated system usage, the ability to developer or use applications accessing one common interface. NOTE: the user mapping again becomes a concern as all federated objects need to be prefaced with the federated ID (as shows in the SELECT statement). Sample Code: import java.sql.*; public class ListCallsFromSurvey { public static void main(String args[]) { String url = "JDBC:ODBC:ITK478"; Connection con; Statement stmt; String query = "select a.call_id, b.dial_digit, c.audience_description from ID07781.call a, ID07781.call_today b, ID07781.audience c where a.call_id = b.dial_digit and a.audience_id = c.audience_id"; try { //Class.forName("oracle.jdbc.driver.OracleDriver"); Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); } catch(java.lang.ClassNotFoundException e) { System.err.print("ClassNotFoundException: "); System.err.println(e.getMessage()); } try { con = DriverManager.getConnection(url,dbUser,dbPassword); stmt = con.createStatement(); // Note the use of the executeQuery method (not executeUpdate) // ResultSet contains the results that come back from the query ResultSet rs = stmt.executeQuery(query); System.out.println("CallID System.out.println("--------- CallID on ACD Audience"); -----------------------"); while (rs.next()) { 29 String c = rs.getString(1); String d = rs.getString(2); String a = rs.getString(3); System.out.println(c + " " + d + " " + a); } stmt.close(); con.close(); } catch(SQLException ex) { System.err.println("SQLException: " + ex.getMessage()); } } } Reports {Mention ‘real world usage} 30 Conclusions Use of Federation as Middleware With version 7 of DB2 Universal Database for Linux, UNIX, and Windows, IBM introduced the first commercial information management system capable of integrating both relational and non-relational data. Federation takes data access to the next level of information integration, allowing the DB2 federated system to act as a virtual database where remote objects can be queried like local tables. A user of developer can leverage the power of DB2 as well as remote data sources, and gain from the transparency, heterogeneity, high function, autonomy, extensibility, and optimization features that such a federated system enables (Linthicum (7), page 368). Web Sphere Federated Server version 9.1 further enhances federation technology, making it easier to build an enterprise federated system. Some of the important features include (IBM, WFS, Admin. Guide): Materialized query tables that can be defined over queries referencing both relational and non-relational nicknames to locally cache the query results The nickname statistics update facility The federated health monitor The ability to import data into a relational nickname and export data from a query involving a nickname An enhanced process model, which takes better advantage of parallelism for queries involving nicknames in a federated system using the data partitioning feature A broader set of non-relational wrappers that includes the Web Sphere Business Integration (WBI) and Web Services wrappers A wrapper development kit in both C++ and Java, which allows customer wrappers to access proprietary data sources from a federated system Learning Curve There were many topics that need to be covered for this SIA, starting with product selection down to database access and file access methodology. However, this SIA was structured to be focused on somewhat simple tasks in order to get a viable system up and running and pursue some simple data source allocation tasks. This SIA completed these tasks and delivered a working system. However, many more issues and components would come into play for a ‘real-world’ system and were not addressed in this SIA. However, many lessons were learned from beginning to end that made the overall activity extremely beneficial DB2 Usage The installation of DB2 EE for Windows provided a rudimentary knowledge base for the use of a popular (albeit expensive) DBMS and how it works, accesses, and is administered. DB2 is a powerful DBMS, but it probably best left to industrial implementations. 31 Password – User Mappings The security features of the Web Sphere Federated Server system were challenging until figured out. How to map the federated ID to the remote data source took quite a bit of time to determine and then set up. However, once the method for setting up relational database wrappers was done, adding subsequent wrappers became very easy. Ease of Installation The installation of DB2 Extended edition and Web Sphere Federated Server took the most time of the entire SIA. The main roadblock was the lack of knowledge that Windows Vista operating system was not supported for Web Sphere Federated Server. The DB2 EE installation was challenging under Vista, but once done, the SIA effort tried to focus on the second piece (Web Sphere Federated Server), which ultimately failed. The activity then focused on Windows XP Professional which turned out to be an easier install. Of course, the knowledge gained from installing DB2 EE made this second attempt much easier. Requirements of Hardware, Software For some reason, installing DB2-C (the “free” DB2 server version) would not work on either Windows XP or Vista, giving memory errors. The DB2 services would never start up even though the installation requirements detailed that only 256MB of memory was needed. Both operating systems were running under 2GB of memory, so this problem was never figured out. This prevented installing Web Sphere Federated Server at any time under DB2-C. The solution was to download and install DB2 Extended Edition which is the GA version at a price. The 90-day installation installed and worked after the fix pack 3 was applied. Web Sphere Federated Server then would install over the top of it. Ease of Use The initial federated data source object was a remote SQL Server database. The configuration of this object was done using the DB2 Control Center and many of the parameters needed were unknown. This caused some delay in usage. However, once some research was done, it appears a file called db2dj.ini needed some environment variables set to access where the actual database drivers resided on the federated server. Once these parameters were set, the relational data sources became easy to set up. 32 Major Problems For this SIA, there were some major problems that had to be overcome which did cause some significant delays in reaching deliverable status. Can’t install on Vista – approximately 3-4 week delay Need Federated Server - Initially thought DB2 DBMS contained the needed software for database federation. This turned out to not be true, DB2 needed to be installed, along with IBM Web Sphere Federated Server. Approximately 2 week delay. 33 Installing on top of DB2 with existing copy – This problem was self-inflicted as trying to install Web Sphere Federated Server with the existing DB2 EE copy continued to fail, until it was realized that the path to the existing copy needed to be selected in the dropdown list earlier in the installation wizard. Usage Scenarios IBM Web Sphere Federated Server (along with DB2 in general) is a very powerful piece of software. Its use is probably more suited to large institutions that come into contact with disparate data sources on a regular basis. Businesses that acquire or merge with other businesses will need to access data quickly and this system allows for that to be done. The other option for these scenarios is to build centralized Data Marts and data warehouses where the disparate data sources use “contributors” to extract and fill the centralized repository, with the needed data, for common access by users and applications. However, this duplicates the storage needs of all DBMSs and file systems and at the same time adds complexity with the needed systems for “contributing” the data to the repository. With the federated approach, all data remains in their original form and are accessed through a common interface that acts as the middleware to the remote sources. Viability found from the SIA With more and more businesses adopting data federation, the challenges of enabling seamless federation continue to grow. However, there is a broad need for many disciplines to be learned in order to use the DB2 Federated Server. Simply knowing SQL or software installation is far from the level of knowledge needed to efficiently use this product or system. To fully allow its effective usage, a team effort or consultants may need to be employed. This product is for corporate or industrial size implementations and needs a full venue of technical knowledge. This SIA has only scratched the surface of it usage. Network Administrators, Data Analysts, Storage Management Analysts, Server Performance Analysts are just a few of the subject matter experts that would be needed to implement this fully. 34 References (1) Betawadkar, Anjali, Lin, Eileen, Ursu, Iiona. “Using data federation technology in IBM Web Sphere Information Integrator: Data federation design and configuration. Part 1”. June 2005. http://www.ibm.com/developerworks/db2/library/techarticle/dm-0506lin/index.html (2) Betawadkar, Anjali, Lin, Eileen, Ursu, Iiona. “Using data federation technology in IBM Web Sphere Information Integrator: Data federation design and configuration. Part 2”. July 2005. http://www.ibm.com/developerworks/db2/library/techarticle/dm-0507lin/index.html (3) IBM. Web Sphere Information Integration. Administrator’s Guide Version 9.1. (4) IBM. Web Sphere Information Integration. Configuration Guide Version 9.1. (5) Lee, Adrian, Magowan, James, Dantressangle. “Bridging the integration gap, Part 1: Bridging the integration gap: Federating grid data”. August 2005 http://www.ibm.com/developerworks/grid/library/gr-feddata/index.html (6) Lightstone, Sam, Teorey, Toby, Nadeau, Tom. Physical Database Design. Morgan Kaufman, 2007 (7) Linthicum, David S. Next Generation Application Integration. Pearson Education, Inc., 2004. 35