Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
ContactPoint wikipedia , lookup
Project P817-PF Database Technologies for Large Scale Databases in Telecommunication Deliverable 1 Overview of Very Large Database Technologies and Telecommunication Applications using such Databases Volume 1 of 5: Main Report Suggested readers: - Users of very large database information systems - IT managers responsible for database technology within the PNOs - Database designers, developers, testers, and application designers - Technology trend watchers - People employed in innovation units and R&D departments. For full publication March 1999 EURESCOM PARTICIPANTS in Project P817-PF are: BT Deutsche Telekom AG Koninklijke KPN N.V. Tele Danmark A/S Telia AB Telefonica S.A. Portugal Telecom S.A. This document contains material which is the copyright of certain EURESCOM PARTICIPANTS, and may not be reproduced or copied without permission. All PARTICIPANTS have agreed to full publication of this document The commercial use of any information contained in this document may require a license from the proprietor of that information. Neither the PARTICIPANTS nor EURESCOM warrant that the information contained in the report is capable of use, or that use of the information is free from risk, and accept no liability for loss or damage suffered by any person using this information. This document has been approved by EURESCOM Board of Governors for distribution to all EURESCOM Shareholders. 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Preface (Edited by EURESCOM Permanent Staff) The Project will investigate different database technologies to support high performance and very large databases. It will focus on state-of-the-art, commercially available database technology, such as data warehouses, parallel databases, multidimensional databases, real-time databases and replication servers. Another important area of concern will be on the overall architecture of the database and the application tools and the different interaction patterns between them. Special attention will be given to service management and service provisioning, covering issues such as data warehouses to support customer care and market intelligence and database technology for web based application (e.g. Electronic Commerce). The Project started in January 1998 and will end in December 1999. It is a partially funded Project with an overall budget of 162 MM and additional costs of around 20.000 ECU. The Participants of the Project are BT, DK, DT, NL, PT, ST and TE. The Project is led by Professor Willem Jonker from NL. This is the first of four Deliverables of the Project and is titled: “Overview of very large scale Database Technologies and Telecommunication Applications using such Databases”. The Deliverable consists of five Volumes, of which this Main Report is the first. The other Volumes contain the Annexes. Other Deliverables are: D2 “Architecture and Interaction Report”, D3 “Experiments: Definition” and D4 “Experiments: Results and Conclusions”. This Deliverable contains an extensive state-of-the-art technological overview of very large database technologies. It addresses low-cost hardware to support very large databases, multimedia databases, web-related database technology and data warehouses. Contained is a first mapping of technologies onto applications in the service management and service provisioning domain. 1999 EURESCOM Participants in Project P817-PF page i (x) Volume 1: Main Report page ii (x) Deliverable 1 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Executive Summary Developments in information and telecommunication technology have lead to a situation where telecommunications services management and service provisioning has become much more data intensive. For example modern switches allow the detailed recording of individual calls leading to huge amounts of data which forms the input to core processes like billing. In addition, new network architectures like Intelligent Networks and TINA, and also mobile networks are more data intensive than traditional telephony. Large datastores are an integral part of these new architectures and offer support for services like number portability, tracking and tracing, and roaming. Also, the shift from traditional telephony to IP based services and to multimedia broadband services will make service provisioning more data intensive especially when operators enter areas such as Web-hosting and Ecommerce. At the same time there is the liberalisation of the telecommunication market. As a result the operators will face competition, which makes service management (including customer care) and fast introduction of new services strategic assets to compete. A central element in customer care is information on the customer. This information is derived from all kinds of data available on the customers. The need for this information is currently driving many initiatives within operators to build large data warehouses that contain an integrated view of the customer base. While we see an increasing need for data management in telecommunication service management and service provisioning, at the same time we see a large number of emerging database technologies. This makes the selection of the right database technologies to support the telecommunication specific needs very difficult. This report will help in better understanding recent developments in database technology as well as to position them with respect to each other. It will also help in identifying the database technologies that are relevant to specific applications in the service management and services provisioning areas. In order to focus the work, it was decided to concentrate on the areas of services provisioning and service management. The reason being that especially these areas require a lot of investments in database technology in the near future due to the developments mentioned above. In addition, there is a focus on technology supporting very large databases, motivated by the fact that service management and service provisioning involves usually large customer bases and more and more complex services resulting in very large databases. A rough picture of the Project focus is given in this figure. 1999 EURESCOM Participants in Project P817-PF page iii (x) Volume 1: Main Report Deliverable 1 The report contains an extensive technological overview, here we will only summarise the, in our opinion, most crucial technologies. We mention hardware to support very large databases, multimedia databases and Web related database technology, and data warehouses. As far as hardware platform support for Very Large Data Bases is concerned, we see the following situation. For very large operational databases, i.e. databases that require heavy updating, mainframe technology (mostly MPP architectures, Massively Parallel Processors) is by far the most dominant technology. For datawarehouses on the other hand, that mostly support retrieval, we see a strong position for the high-end UNIX SMP (Symmetric Multi Processor) architectures. The big question with respect to the future is about the role of Windows NT on Intel. Currently there is no role in very large databases for these technologies, however this may change in the coming years. There are two mainstreams with respect to NT and Intel. On the one hand NUMA (Non Uniform Memory Architecture, a kind of extended SMP architecture) with Intel processors, and on the other hand clustered Intel machines. NUMA is more mature and supports major databases like Oracle and Informix. However, NUMA is still based on Unix, but suppliers work on NT implementations. Database technology supporting NT clusters is not really available yet, with the exception of IBM DB2. This area will be closely followed by the Project and actual experiments may be planned to assess this technology. Multimedia databases and Web related database technology is developing very fast. All major database vendors support Web connectivity nowadays. There is a strong focus on database-driven Web-sites and E-commerce servers for the Web. The support for multimedia data support is rather rudimentary. Although vendors like Oracle, Informix and IBM have made a lot of noise on Universal Servers that support multimedia data. The proposed extendible architectures turned out be relatively closed and unstable. Current practice is still mainly handling of multimedia data outside the database. Data warehouse technology is one of the most dynamic areas nowadays. All database vendors and mainframe vendors are in this area. One has to be very careful here, a data warehouse is not simply a large database. There is a lot of additional technology for data extraction, metadata management, and architectures. Of course all major vendors have their own methodology and care has to be taken not to be locked in. A rather new development is that of operational datastores, these are data warehouses with limited update capabilities. Especially for the propagation of these updates back page iv (x) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report to the originating databases no stable solutions exist. Therefore great care has to be taken when embarking on operational datastores. Telecommunication services are becoming more and more data intensive, as a result the role of database technology will only increase. Therefore, decisions with respect to database technology become crucial elements to maintain control over the data management around those services, and also to maintain a strong, flexible and competitive position. This report is a state-of-the-art overview of database technology with a first mapping of technologies onto applications in the service management and service provisioning domain. The Project will deliver a further report on guidelines for the construction of very large databases for the most dominant applications in the above domains. This report will not only focus on the database but also of the embedding of the database in the overall application architecture. Finally, the Project will report on a number of hands-on experiments that will be carried out during 1999 to validate the guidelines and to assess the database technology involved. 1999 EURESCOM Participants in Project P817-PF page v (x) Volume 1: Main Report page vi (x) Deliverable 1 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report List of Authors Jeroen Wijnands (overall editor) KPN Research, The Netherlands Wijnand Derks KPN Research, The Netherlands Willem Jonker KPN Research, The Netherlands 1999 EURESCOM Participants in Project P817-PF page vii (x) Volume 1: Main Report Deliverable 1 Table of Contents Preface ............................................................................................................................. i Executive Summary ...................................................................................................... iii List of Authors ............................................................................................................. vii Table of Contents ........................................................................................................ viii Abbreviations ................................................................................................................. x 1 Introduction ................................................................................................................. 1 1.1 Technical introduction to Deliverable 1 ........................................................... 1 1.2 Guidelines for reading Deliverable 1 ............................................................... 1 1.3 Division of labour among partners ................................................................... 2 1.4 Introduction to the main report ......................................................................... 2 2 Very large database definition .................................................................................... 5 2.1 The definition ................................................................................................... 5 2.2 Examples of VLDB systems ............................................................................ 6 3 Database technologies for telecommunication applications ....................................... 9 3.1 Database Server Architectures ......................................................................... 9 3.1.1 Hardware architectures ....................................................................... 9 3.1.2 Data placement .................................................................................. 10 3.1.3 Commercial database servers ............................................................ 11 3.1.4 Analysis ............................................................................................. 12 3.2 Retrieval and Manipulation ............................................................................ 12 3.2.1 Query Processing in a distributed environment ................................ 12 3.2.2 Query processing in Federated Databases ........................................ 13 3.2.3 Commercial database products ......................................................... 13 3.2.4 Analysis ............................................................................................. 15 3.3 Backup and Recovery ..................................................................................... 15 3.3.1 Security ............................................................................................. 15 3.3.2 Backup and recovery strategies ......................................................... 16 3.3.3 Commercial products ........................................................................ 17 3.3.4 Analysis ............................................................................................. 18 3.4 Benchmarking................................................................................................. 18 3.4.1 Available benchmarks ....................................................................... 18 3.4.2 TPC benchmarks ............................................................................... 18 3.4.3 Analysis ............................................................................................. 20 3.5 Performability modelling ............................................................................... 20 3.5.1 Performability ................................................................................... 21 3.5.2 Tools for performability modelling................................................... 21 3.5.3 Guidelines for measures for the experiments .................................... 21 3.5.4 Analysis ............................................................................................. 22 3.6 Data warehousing ........................................................................................... 22 3.6.1 Data warehouse architectures ........................................................... 22 3.6.2 Design Strategies............................................................................... 25 3.6.3 Data Cleansing, Extraction, Transformation, Load Tools ................ 26 3.6.4 Target Databases ............................................................................... 26 3.6.5 On-Line Analytical Processing (OLAP) Technology and Tools ...... 27 3.6.6 Data Mining ...................................................................................... 28 page viii (x) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report 3.6.7 Data Warehousing on the Web ......................................................... 28 3.6.8 Analysis ............................................................................................ 29 3.7 Transaction processing .................................................................................. 29 3.7.1 Commercial TP Monitors ................................................................. 31 3.7.2 Analysis ............................................................................................ 33 3.8 Multimedia databases..................................................................................... 34 3.8.1 Querying and content retrieval in MMDBs ...................................... 35 3.8.2 Transactions, concurrence and versioning in MMDBs .................... 35 3.8.3 Multi-media objects in relational databases ..................................... 36 3.8.4 Multimedia Objects in Object-Oriented Databases .......................... 37 3.8.5 Analysis ............................................................................................ 37 3.9 Databases and the World Wide Web ............................................................. 37 3.9.1 The Internet and the World Wide Web ............................................ 38 3.9.2 Database gateway architectures ........................................................ 38 3.9.3 Web databases and security .............................................................. 40 3.9.4 Web database products .................................................................... 41 3.9.5 Analysis ............................................................................................ 41 4 Mapping of telecommunication applications on database technologies .................. 43 4.1 Service management applications .................................................................. 43 4.2 Service provisioning applications .................................................................. 43 4.3 The telecommunication applications/ database technologies matrix. ............ 44 5 General analysis and recommendations.................................................................... 47 References ................................................................................................................... 48 1999 EURESCOM Participants in Project P817-PF page ix (x) Volume 1: Main Report Deliverable 1 Abbreviations CDR Call Detail Record CSCW Computer Supported Co-operative Work DT Deutsche Telekom DBMS DataBase Management System ERP Enterprise Resource Planning KPN Koninklijke PTT Nederland (Royal PTT Netherlands) MPP Massive Parallel Processing (or Processors) NUMA Non-Uniform Memory Architecture PIR Project Internal Result SD Shared-Disk (architecture) SM Shared-Memory (architecture) SMP Symmetric Multi Processing (or Processors) SN Shared-Something (architecture) TD Tele Denmark TE Telefonica TPC Transaction Processing Council UPS Uninterruptable Power Supply VLDB Very Large DataBase WinTel Windows Intel (platform) page x (x) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 1 Volume 1: Main Report Introduction This Deliverable contains the results of activities carried out in task 2, entitled “Overview of Very Large Database Technologies and Telecommunications Applications using such databases”, of the P817 Project. This introduction chapter provides some technical information, guidelines for reading the Deliverable as a whole, the assignment of activities to partners and finally a guideline for reading this main report. 1.1 Technical introduction to Deliverable 1 Telecommunication services are becoming more and more data intensive in several areas such as network control, network management, billing services, traffic analysis, service management, service marketing, customer care and fraud detection. For several of these applications, fast access and secure data storage over a longer period of time is crucial. Given the increasing amount of network traffic, registering and retrieving of these data requires very large databases in the order of TeraBytes. Successful exploitation of high performance database technology will enable PNOs to optimise the quality of the telecommunication services at lower costs. It will also enable them to expand the use of their services. To facilitate this, knowledge on the key database technologies involved is crucial. Proper assessment of the database technology involved in the telecommunication domain is a joint PNOs interest that has a broad scope, a technical focus, and requires serious financial investments. This Deliverable provides the first step of this process. The objective is to define the term “Very Large Database”, give an overview of the telecommunication applications using such databases and give an overview of the database technologies used in such databases. Concerning the telecommunication applications the focus will be on service management and service provisioning. To achieve this goal the following four activities have been defined: 1. Define what a “Very Large Database” is (PIR 2.1) 2. Determine telecommunication applications using such databases (PIR 2.2) 3. Describe the state-of-art on database technology used with such databases (PIR 2.3) 4. Make a telecommunication applications/database technologies matrix (PIR 2.4) The results of these activities are recorded in this Deliverable. 1.2 Guidelines for reading Deliverable 1 Because of the large amount of information produced for this Deliverable, it is decided to cluster information into several volumes. This clustering also benefits the disclosure of information to readers with different roles. Depending on the role, a more or less detailed document can be read. The Deliverable is composed of the following volumes: Volume 1, the main report (this document), positioning and summarising the knowledge gained in task 2. This volume is intended for readers who want a high 1999 EURESCOM Participants in Project P817-PF page 1 (48) Volume 1: Main Report Deliverable 1 level overall overview of the results of task 2. This volume is also a guideline for determining relevant annexes. 1.3 Volume 2, Annex 1 entitled “Architectural and performance issues” contains detailed descriptions on the subjects “Database Server Architectures”, “Performability modelling and analysis/simulation of very large databases” and “Benchmarks”. This volume is intended for specialists in the mentioned areas. Volume 3, Annex 2 entitled “Data manipulation and management issues” contains detailed descriptions on the subjects “Transaction Processing Monitors”, “Retrieval and Manipulation” and “Backup and Recovery”. This volume is intended for specialists in the mentioned areas. Volume 4, Annex 3 entitled “Advanced database technologies” contains detailed descriptions on the subjects “Web related database technology”, “Multi media databases” and “Data warehousing”. This volume is intended for specialists in the mentioned areas. Volume 5, Annex 4 entitled “Database technologies in telecommunication applications” contains detailed descriptions on “Telecommunication applications using very large databases” and “Application requirements versus available database capabilities. This volume provides a bridge between telecommunication applications and very large database technologies. Division of labour among partners The writing of this Deliverable has been a joined effort of KPN Research (task leadership), DT, TD, TE and Telia with the following division of labour: 1.4 Volume 1 KPN Research Volume 2 KPN Research, Telefonica Volume 3 Deutsche Telekom, Tele Denmark, Telefonica Volume 4 Telia, Tele Denmark Volume 5 Telia, Deutsche Telekom Introduction to the main report This volume is the main report of Deliverable 1. It provides a summary and overview of very large database technologies relevant for telecommunication applications. To be more specific, the application areas “Service Management” and “Service Provisioning” have been defined as the areas of main interest. Chapter 2 starts with the definition of a very large database in the context of this Project. Chapter 3 provides detailed descriptions a number of Very Large Database technologies. Chapter 4 provides a bridge between the database technologies described in chapter 3, and the telecommunication applications where they can be applied. First, the Service management and Service Provisioning applications are described. Service management is a key issue for PNOs. For handling different services, management systems such as billing system, customer ordering system, customer/user management page 2 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report system, etc. are available. These management systems heavily rely on very large database systems. Service Provisioning involves the direct offering of services to customers. Some examples are E-commerce, Video-on-demand, Tele-education and hosting services. The increasing amount of data involved in these services also requires very large database technology. Finally, this volume ends with chapter 5 describing the general analysis and recommendations. 1999 EURESCOM Participants in Project P817-PF page 3 (48) Volume 1: Main Report page 4 (48) Deliverable 1 1999 EURESCOM Participants in Project P817-PF Deliverable 1 2 Volume 1: Main Report Very large database definition The purpose of this chapter is to define a Very Large DataBase (VLDB) as agreed upon by the participants of the P817 Project. The definition of a VLDB is intended to suite the needs of P817, and may or may not ignore databases usually considered very large by others. Note that is not our intention to give a mathematical precise definition of a VLDB, because we think that is impossible. The definition is used to get a common understanding among the P817 participants and Deliverable readers on the term VLDB. 2.1 The definition Before continuing, we have to clarify what we mean with the terms Database (DB), DataBase Management System (DBMS) and Database System. A DB is a collection of data with relations between the data as defined in a data schema. A DBMS is the software that: supports the storage of very large amounts of data over a long period of time, keeping it secure from accident or unauthorised use and allows efficient access to the data for queries and database modifications. controls access to the data of many users at once, without allowing actions of one user to affect the actions of other users and without allowing simultaneous accesses that could corrupt the data. allows users to create new databases and specify their schema (logical structure of the data), using a specialised language called a Data Definition Language. gives users the ability to query the data and modify the data, using an appropriate language, called a query language or Data Manipulation Language. A DB system is the combination of a DB with a DBMS1. A Very Large DataBase system is mainly characterised by two issues viz.: Size: the number of bytes needed to store the data, index, etc. It should be noted that the concept of a large size is a time dependent and technology dependent issue. First storage systems are getting cheaper and cheaper so a large size today is already a smaller size tomorrow. Second, 1 Tb on a WinTel platform is called very large nowadays while the same amount of data is regular for a mainframe. Workload: the number of concurrent users and the size of their transactions. Note that, according to this definition, a heavy workload is not per definition the same as “a large number of users” e.g. a small number of concurrent users with large transaction (typical in a data warehouse environment) can also be a heavy workload. We consider a database system to be a very large database system when it has a large size and heavy workload. The area of interest in the context of P817 is depicted in the following figure. 1 In practise, the terms Database, DBMS and Database System are often used in an interchangeable fashion. 1999 EURESCOM Participants in Project P817-PF page 5 (48) Volume 1: Main Report Deliverable 1 Above, size and workload are used to characterise a VLDB system. When assessing VLDB technology in the context op P817, the following issues will be addressed: a) Scaleability: expressed in two definitions: speed-up and scale-up. Speed-up is faster execution with constant problem size. Scale-up is the ability of an N-times larger system to perform an N-times larger job in the same elapsed time as the original system. b) Performance: defined as the absolute execution characteristics of the system. This includes execution times, latency and throughput of the interconnect and disk I/O speed. c) Manageability: defined as the ease of which the total system is configured and changed. Manageability addresses issues such as configuration, loading, backup and change management. d) Robustness: defined as how the systems can handle, both software and hardware failures. e) Costs: defined as the costs required for setting up and maintaining the system. 2.2 Examples of VLDB systems Up to now it is not defined what we mean with “very large size” and “heavy workload”. The main reason for this is the time-dependants of these terms viz. computers become faster and storage systems become larger. Giving absolute numbers will make the definition obsolete within a short period of time. For this reason we have decided to give some numbers applying to state-of-the-art (beginning 1998) running commercial database systems. The systems we have chosen to use as examples are systems that can handle a certain workload when the load is measured using Transaction Processing Performance Council (TPC) standards. We realise that the systems used for these benchmarks are tuned for optimal performance for this specific test and real life business systems will never reach the resulting performances, but the figures present a practical performance and size upper limit for current systems. Of course there are many other page 6 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report examples of VLDB systems, but TPC standards are well known and give a clear picture of which classes of systems we consider to be VLDB systems. In section 3.4.2 a top four is given of current TPC figures. 1999 EURESCOM Participants in Project P817-PF page 7 (48) Volume 1: Main Report page 8 (48) Deliverable 1 1999 EURESCOM Participants in Project P817-PF Deliverable 1 3 Volume 1: Main Report Database technologies applications for telecommunication The previous chapter provided a generally applicable definition of a very large database. This chapter continues with detailed descriptions of relevant database technologies. Readers for whom the provided material in this main report already is too detailed, can quickly scan the chapter by only reading the “Analysis” sections at the end of each database technology. Readers for whom the provided material is not detailed enough, are referred to the annexes of this Deliverable ([1], [2], [3]). 3.1 Database Server Architectures An important characteristic of the PNO business in general and the PNO applications in specific is the large number of customers involved. Several millions of customers is not an exception. This large number of customers results in very large amounts of data that have to be stored and processed. The very large database systems supporting these applications need parallel architectures to meet the requirements. First some theoretical architectures are described. This information is used to position the commercial available architectures described in the subsequent section. For more detailed information on this subject, the reader is referred to [1]. 3.1.1 Hardware architectures Parallelism should provide Performance, Availability and Scaleability. To meet the needs, many hardware architectures are described in literature but the following four represent the main stream: Shared-Memory Shared-memory (SM) systems are complex hardware systems with multiple processors, connected by a high bandwidth interconnect through which shared memory and disks can be accessed. Each CPU has full access to both memory and disks. CPU CPU CPU interconnect MEM DISK MEM DISK Shared-Disk The shared-disk (SD) configuration consists of several nodes. Each node has one CPU with private memory. Unlike the SM architecture, the shared-disk architecture has no memory sharing across nodes. Each node communicates via a high speed interconnect to the shared disks. 1999 EURESCOM Participants in Project P817-PF page 9 (48) Volume 1: Main Report MEM Deliverable 1 CPU CPU MEM CPU MEM interconnect DISK DISK DISK Shared-Nothing The shared-nothing (SN) architecture does not have any shared resource (except for the interconnect). Each node consists of a CPU with private memory and private storage device. These nodes are interconnected by a network. This network is typically standard technology. Shared-nothing systems are called loosely-coupled systems. MEM DISK DISK CPU CPU DISK MEM CPU MEM interconnect Shared-Something The architectures described above are three pure architectures. All architectures have advantages and disadvantages when looking at performance, availability and scaleability. Therefore it makes sense to make a combination of these three architectures called the Shared-Something (SS) architecture. To summarise, Table 1 gives an overview of the hardware architectures in terms of cost, DBMS complexity, performance, availability and scaleability. cost DBMS performance availability scaleability complexity SM + - + - - SD 0 0 0 0 0 SN - + - + + SS + + + 0 + 0 = moderate - = low + = high Table 1 comparison of hardware 3.1.2 Data placement When buying database hardware and software, the underlying architecture and functionality are a given fact. This is not the case for the applications, data and datamodel. Solid knowledge of how parallel processing influences these issues will increase the performance, availability and scaleability. First, the type of transactions performed on the database is important. Read-transactions, for example, will have page 10 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report another impact than write-transactions or read/write transactions. Next, the placement of data over the available resources (nodes, CPUs, disks) in the parallel database is important. On the one hand, splitting up data will increase the degree of parallelism and thus increase the performance. On the other hand, the same splitting will have negative consequences for data that has to be joined. 3.1.3 Commercial database servers Parts of the theoretical hardware architectures described in section 3.1.1, are visible in commercially available systems. Again, many architectures are available, but we limit ourselves to four major architectures. Symmetric Multi Processing (SMP) SMP systems use a shared memory model and all resources are equally accessible. The operating system and the hardware are organised so that each processor is theoretically identical. The main performance constraint of SMP systems is the performance of the interconnect. Applications running on a single processor system, can easily be migrated to SMP systems without adaptations. However, the workload must be suitable to take advantage of the SMP power. All major hardware vendors have SMP systems, with Unix or NT operating systems, in their portfolio. They distinguish in maximum number and capacity of processors; maximum amount of memory; maximum storage capacity. SMP machines are already very common nowadays. Clusters The scaleability limits of SMP systems combined with the need for increases resilience led to the development of clustered systems. A cluster combines two or more separate computers, usually called nodes, into a single system. Each node can be an uni-processor system or an SMP system, has its own main memory and local peripherals and runs its own copy of the operating system For this reason the cluster architecture is also called a Shared Nothing architecture. The nodes are connected by a relatively high speed interconnect. Commercially available cluster solutions distinguish in the maximum number of nodes and the type of interconnect. In the world of Unix, several cluster solutions are already available (e.g. the SUN Enterprise Cluster). In the World of Windows NT, clusters are yet in their infancy. Microsoft, together with parties as Compaq (especially the business units Tandem and Digital), is working on Clustering software (MS Cluster Server (codename “Wolfpack”)). Up to now, only two node failover is available (e.g. NCR’s LifeKeepr) but the strong position of Microsoft and the dominant presence of Windows NT will increase the importance of NT clustering. Massive Parallel Processing (MPP) A Massively Parallel Processing system consists of a large number of processing nodes connected to a very high-speed interconnect. MPP systems are considered as Shared-Nothing, that is, each node has its own private memory, local disk storage and a copy of the operating system and of the database software. Data are spread across the disks connected to the nodes. MPP systems are very well suited for supporting VLDBs but they are very expensive because of the need of special versions of the operating system, database software and compilers, as well as a fundamentally different approach to software design. For this reason, only a small top-end of the market uses these systems. Only few vendors have MPP systems in their portfolio. 1999 EURESCOM Participants in Project P817-PF page 11 (48) Volume 1: Main Report Deliverable 1 Among them IBM, with its RS/6000 Scaleable POWER parallel system, and Compaq (Tandem) with its Himalaya systems. Non-Uniform Memory Architecture (NUMA) A NUMA system consists of multiple SMP processing nodes that share a common global memory, in contrast to the MPP model where each node only has direct access to the private memory attached to the node. The non-uniformity in NUMA describes the way that memory is accessed. Somehow, the NUMA architecture is a hybrid resulting from SMP and MPP technologies. As it uses a single shared memory and a single instance of the operating system, the same applications as in an SMP machine run without modifications. The latter advantage makes NUMA a significant competitor to pure MPP machines. Several vendors have computers based on NUMA in their portfolio among which Data General, IBM, ICL, NCR, Pyramid, Sequent and Silicon Graphics. 3.1.4 Analysis For all types of very large database applications, several parallel platforms are already commercially available. Where first only dedicated and expensive hardware and software were available, nowadays new architectures appear on the scene like NUMA and Windows NT clusters. These new architectures try to achieve the desired scaleability, performance and manageability by using commodity components in a parallel way what should result in lower costs of ownership. At this moment, these new architectures still have to prove themselves. 3.2 Retrieval and Manipulation Using efficient data manipulation and retrieval algorithms becomes very important when the database size and workload take on Very Large proportions. This section describes some issues that are characteristic to data retrieval in distributed database environments. Also, an overview is provided of the major commercial database products in the VLDB segment. For more detailed information on retrieval and manipulation or commercial database products, the reader is referred to [2]. 3.2.1 Query Processing in a distributed environment The steps to be executed for query processing are in general: parsing a request in an internal form, validating the query against meta-data information (schemes or catalogues), expanding the query using different internal views and finally building an optimised execution plan to retrieve the requested data objects. In a distributed system the query execution plans have to be optimised in a way that query operations may be executed in parallel, avoiding costly shipping of data. Several forms of parallelism may be implemented: Inter-query-parallelism allows the execution of multiple queries concurrently on a database management system. Another form of parallelism is based of the fragmentation of queries (sets of database queries, e.g. selection, join, intersection, collecting) and on parallel execution of these fragment pipelining the results between the processes. Inter-query-parallelism may be used in two forms, either to execute producer and consumers of intermediate results in pipelines (vertical inter-operator parallelism) or page 12 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report to execute independent subtrees in a complex query execution plan concurrently (horizontal inter-operator parallelism). 3.2.2 Query processing in Federated Databases A federated database is conceptually just a mapping of a set of (possible heterogeneous) databases. When the word federated is used, it indicates that the federated database is a mapping of a set of databases not originally designed for a mutual purpose. This gives rise to special problems. A solution can be found in using a distributed query processor consisting of a query mediator and a number of query agents, one for each local database. The query mediator is responsible for decomposing global queries given by multi database applications into multiple subqueries to be evaluated by the query agents. It also assembles the subquery results returned by the query agents and further processes the assembled results in order to compute the final query result. Query agents transform subqueries into local queries that can be directly processed by the local database systems. The local query results are properly formatted before they are forwarded to the query mediator. By dividing the query processing tasks between query mediator and query agents, concurrent processing of subqueries on local databases is possible, reducing the query response time. This architectural design further enables the query mediator to focus on global query processing and optimisation, while the query agents handle the transformation of subqueries decomposed by query mediator into local queries. It is a job for the query agents to convert the subqueries into local queries on heterogeneous local schemes. The heterogeneous query interfaces of local database systems are also hidden from the query mediator by the query agents. 3.2.3 Commercial database products 3.2.3.1 Tandem Non-Stop Clusters software combines a standards-based, version of SCO UnixWare 2.1.2 with Tandem’s single system image (SSI) clustering software technology. SSI simplifies system administration tasks with a consistent, intuitive view of the cluster. This helps migrating current UNIX system applications to a clustered environment. It also allows transparent on-line maintenance such as hot plug-in of disks, facilitates the addition of more nodes to the cluster, and provides automatic failover and recovery. 3.2.3.2 Oracle8 Oracle8 is a data server from Oracle. Oracle8 is based on a object-relational model. In this model it is possible to create a table with a column whose datatype is another table. Oracle products run on Microsoft Windows 3.x/95/NT, Novell NetWare, Solaris, HPUX and Digital UNIX platforms. Many operational and management issues must be considered in designing a very large database under Oracle8 or migrating from an Oracle7 (the major predecessor of Oracle8) database. If the database is not designed properly, the customer will not be able to take full advantage of Oracle8’s new features. In annex 2, issues are discussed related to designing a VLDB under Oracle8 or migrating from an Oracle7 database. 1999 EURESCOM Participants in Project P817-PF page 13 (48) Volume 1: Main Report 3.2.3.3 Deliverable 1 Informix Dynamic Server Informix Dynamic Server is a multithreaded relational database server that employs a single processor or symmetric multiprocessor (SMP) systems and dynamic scaleable architecture (DSA) to deliver database scaleability, manageability and performance. Informix Dynamic Server works on different hardware equipment, among which UNIX and Microsoft Windows NT based. 3.2.3.4 IBM DB2 IBM delivered its first phase of object-relational capabilities with Version 2 of DB2 Common Server in July, 1995. In addition, IBM released several packaged Relational Extenders for text, images, audio, and video. The DB2 Universal Database combines Version 2 of DB2 Common Server, including object-relational features, with the parallel processing capabilities and scaleability of DB2 Parallel Edition on SMP, MPP, and cluster platforms. DB2 Universal Database, for example, will execute queries and UDFs in parallel. The DB2 product family spans AS/400 systems, RISC System/6000 hardware, IBM mainframes, non-IBM machines from Hewlett-Packard and Sun Microsystems, and operating systems such as OS/2, Windows (95 & NT), AIX, HP-UX, SINIX, SCO OpenServer, and Sun Solaris. 3.2.3.5 Sybase The Sybase Computing Platform includes a broad array of products and features for Internet/middleware architecture support, decision support, mass-deployment, and legacy-leveraging IS needs, bundled into an integrated architecture. The Adaptive Server DBMS product family, the core engine of the Sybase Computing Platform, supports new application data needs like mass-deployment, enterprise-scale OLTP, and terabyte-scale data warehousing. The Sybase Computing Platform allows developers to create applications that run without change on all major platforms and architectures, scaling up from the laptop to the enterprise server or Web server. These applications can take advantage of the scaleability of Adaptive Server and PowerDynamo, the flexibility and programmer productivity of Powersoft's Java tools, and the legacy interoperability of Sybase's Enterprise CONNECT middleware. 3.2.3.6 Microsoft Microsoft SQL Server Enterprise Edition 6.5 is a high-performance database management system designed specifically for the largest, highly available Microsoft Windows NT operating system applications. It extends the capabilities of SQL Server by providing higher levels of scaleability, performance, built-in high-availability, and a comprehensive platform for deploying distributed, mission-critical database applications. When this report was released, Microsoft introduced its new Microsoft SQL Server 7.0. This DBMS has been redesigned from scratch, resulting in a scaleable parallel architecture to conquer the very large database market. The use of the so called “zero administration concept” (that is minimise the human intervention for maintaining the database), should increase the acceptance of the product. page 14 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 3.2.3.7 Volume 1: Main Report NCR Teradata Beginning with the first shipment of the Teradata RDBMS, NCR has over 16 years of experience in building and supporting data warehouses worldwide. Today, NCR’s Scaleable Data Warehousing (SDW) delivers solutions in the data warehouse marketplace, from entry-level data marts to very large production warehouses with hundreds of Terabytes. Data warehousing from NCR is a complete solution that combines Teradata parallel database technology, scaleable hardware, experienced data warehousing consultants, and industry tools and applications available on the market today. 3.2.4 Analysis Retrieval and manipulation of data in different database architectures has various options for finding optimal solutions for database applications. In recent years many architectural options have been discussed in the field of distributed and federated databases and various algorithms have been implemented to optimise the handling of data and to optimise methodologies to implement database applications. Nevertheless, retrieval and manipulation in different architectures apply similar theoretical principals for optimising the interaction between applications and database systems. Efficient query and request execution is an important criterion when retrieving large amounts of data. There are a number of commercial database products competing in the VLDB segment, most of which run on various hardware platforms. The DBMSs are generally supported by a range of tools for e.g. data replication and data retrieval. 3.3 Backup and Recovery More and more, databases have to be available 24 hours a day, seven days a week. But as no system is free of failure, special actions have to be taken to guarantee this availability. Regularly making backups of the data and being able to recover from these backups are examples of these special actions. This section describes the issues related to backup and recovery of very large databases. For more detailed information on this subject, the reader is referred to [2]. 3.3.1 Security Backup facilities are needed to be able to recover from lost data. Losing data is not only the result of hard- and software failures but also of wrong user actions. Unauthorised access to the system could also result in (wilful) destruction or modification of data. For this reason, some security issues are treated in this section. Data security involves two main aspects viz.: Data protection: required to prevent unauthorised users from understanding the physical content of data. This function is typically provided by data encryption. Authorisation control: to guarantee that only authorised users perform operations they are allowed to perform on the database. The sequel of this section concentrates on data loss caused by wrong user actions or hard- and software failures. 1999 EURESCOM Participants in Project P817-PF page 15 (48) Volume 1: Main Report 3.3.2 Deliverable 1 Backup and recovery strategies In general, one can classify failures resulting in possible loss of data in the following six categories: 1. User failures: caused by an user deleting or changing data improper or erroneous. 2. Operation failures: caused by an illegal operation resulting in the DBMS reacting with an error message. 3. Process failures: caused by an abnormal ending process. 4. Network failures: caused by an interruption of the network between the database client and the database server. 5. Instance failures: caused by a failing database instance with accompanying background processes. 6. Media failures: caused by read or write errors due to defect physical hardware . As the first four types of failures can be handled by the DBMS, backup and recovery strategies are mostly used to deal with the consequences of the last two types of failures. One way to be resistant against these type of failures is by using Uninterruptable Power Supplies (UPS) and redundant hardware (e.g. mirroring of disks and power supplies). Another way is the so called “hot stand-by” solution where a copy of the complete system exists (preferable situated at another location) and all operation on the system are also performed at the hot stand-by site. When one of the systems crashes, the other system can take over. Despite these facilities, backups of data are still important. Backup and recovery can be done at two levels viz. operating system level and database level. At operating system level, operating system (and third-party) tools are used to backup files and raw disks to backup media (e.g. tape). These tools have less notion of the data they are handling but copying is rather fast. At the database level, knowledge of the structure of the database and transactions performed on it are available, enabling more flexibility in copying parts of the database and recovering incrementally. The following backup & recovery strategies can be used to recover from an erroneous database state: Dump and restart: where the entire database is regularly copied to a backup device and completely restored from this device in the event of failure. Undo-redo processing (also called roll-back and re-execute): where an audit trail of all performed transactions is used to undo all (partially) performed transactions to a known correct point in time. From that point on, the transactions are re-executed to yield a correct database state. This strategy can be used when partially completed processes are aborted. Roll-forward processing (also called reload and re-execute): where all or part of a previous correct state is reloaded after which the recently recorded transactions from the transaction audit trail are re-execute to obtain a correct state. It is typically used when (part of) the physical media has been damaged. Restore and repeat: a variation on the previous strategy where the net result of all transactions in the audit trail is given to the database. page 16 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report When the database is off-line during the backup process one calls it a Cold Backup. When the database is on-line (and users can use it during the backup process) one calls it a Warm Backup. The selection of backup and recovery strategies is driven by quality and business guidelines. The qualities of backup and recovery strategies are Consistency, Reliability, Extensibility/scaleability, Support of heterogeneous environments and Usability. The business guidelines are Speed, Application load, Backup testing resources, Restoration time and Type of system. 3.3.3 Commercial products As mentioned above backup and recovery can be performed at operating system level and at database level. In the following section some products for both levels are listed. Some criteria for selecting a product are for example: 3.3.3.1 capabilities for long-term data archives and Hierarchical Storage Management (HSM) operations to automatically move infrequently used data to other (less expensive) devices. Supports of the range of hardware platforms. Usability Support for extensive storage device support. capabilities for data compression to reduce network traffic and transmission time. Multitasking capability. On-line (Warm) and off-line (Cold) database backup and archive support. Security capabilities. Operating System level backup and recovery The backup and recovery products at the operating system level depend on the type of operating system and the architecture and software used to make the storage system. The products are system dependent but DBMS independent. The user of the product should provide the knowledge of which files are datafiles, index files, transaction logs etc. For the PC-oriented database servers the following lists represent the most commonly used vendors/products: Arcada Software/Storage Exec., Cheyenne Software/ArcServe, Conner Storage Systems/Backup Exec, Emerald Systems/Xpress Librarian, Fortunet/NSure NLM/AllNet, Hewlett Packard/ Omniback II, IBM/ADSM (Adstar Distributed Storage Manager), Legato/ Networker, Mountain Network Solutions/FileSafe, Palindrome/Backup Director, Performance Technology/PowerSave, Systems Enhancement/Total Network Recall. For the Unix-oriented database servers the following lists represent the most commonly used vendors/products: APUnix/FarTool, Cheyenne/ArcServe, Dallastone/D-Tools, Delta MycroSystems (PDC)/BudTool, Epoch Systems/Enterprise Backup, IBM/ADSM, Hewlett 1999 EURESCOM Participants in Project P817-PF page 17 (48) Volume 1: Main Report Deliverable 1 Packard/Omniback II, Legato/Networker, Network Imaging Systems, Open Vision/AXXion Netbackup, Software Moguls/SM-arch, Spectra Logic/Alexandria. 3.3.3.2 Database level backup and recovery All major DBMSs provide backup and recovery tools. These tools support the backup and recovery strategies described in section 3.3.2. Both cold and warm backups are supported. In [2], detailed descriptions of the capabilities of DB2, Oracle 7, Oracle 8, Informix, Sybase and SQLServer are given. Furthermore, [2] contains an appendix with figures of a terabyte database backup and recovery benchmark. Hot backups at a rate of between 500 Gb and 1 Tb per hour are reached with a total system overhead of only a few percent (thus leaving over 90% of the system resources for “normal” database users). 3.3.4 Analysis As no system (not even a fault-tolerant one) is free of failures, making backups of data is essential for an organisation. Moreover, errors are not only caused by hardand software failures but also by (un)wilful wrong user actions. Some types of failures can be corrected by the DBMS immediately (e.g. wrong user operations) but others need a recovery action from a backup device (e.g. disk crashes). Depending on issues like the type of system, the availability requirements, the size of the database etc., one can choose from two levels of backup and recovery. The first is on the operating system level and the second on the database level. Products of the former are often operating system dependent and DBMS independent and products of the latter the other way around. Which product to choose depends on the mentioned issues. 3.4 Benchmarking A database benchmark is a method of doing a quantitative comparison of different database management systems (DBMS) and hardware platforms in terms of performance and price/performance metrics. These metrics are obtained by means of the execution of a performance test on applications. Customers use benchmarks to choose among vendors. Vendors often use benchmarks for marketing purposes. For more detailed information, the reader is referred to [1]. 3.4.1 Available benchmarks Several benchmarks are available nowadays, but only few are general accepted and used. Examples are the OO7 OODBMS Benchmark, from the University of Wisconsin, for object oriented databases; the HyperModel Benchmark, a DBMS performance test suite based upon a hypertext application model and the TPC benchmarks, a family of benchmarks to model “real” business applications. As the TPC benchmarks are the most commonly used, we will give a more detailed description in the next sections. 3.4.2 TPC benchmarks The Transaction Processing Council (TPC) is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, page 18 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report verifiable performance data to the industry. It was founded in 1988 by a consortium of hardware and software vendors in response to a confusion caused by benchmark problems. While the majority of TPC members are computer system vendors, the TPC also has several DBMS vendors. In addition, the TPC membership also includes market research firms, system integrators, and end-user organisations. There are over 40 members world-wide. Some of those members are: Database system vendors like Oracle, Informix, Sybase, etc. Hardware platforms manufacturers like HP, Sun, IBM, Bull, Intel, Silicon Graphics, Acer, etc. Software vendors like BEA Systems, Computer Associates, etc. The TPC has developed a series of benchmarks. Currently, the so called TPC-C benchmark is used for OLTP systems and the TPC-D benchmark is used for DSS systems. 3.4.2.1 TPC-C TPC-C simulates a complete computing environment where a population of terminal operators executes transactions against a database. The benchmark is centred around the principal activities (transactions) of an order-entry environment. The metrics obtained from the TPC-C benchmark are the Transactions-per-minute (tpmC) and the Costs-per-tpmC ($/tpmC). As an example, the following table shows some results of TPC-C metrics (representing the situation at the beginning of 1998). Platform tpmC $/tpmC DBMS IBM RS/6000 SP Model 309 (12 node x 8 way) 57,053.80 $147.40 Oracle8 Enterprise Edit’n 8.0.4 HP 9000 V2250 (16-way) 52,117.80 $81.17 Sybase ASE 11.5 EBF 7532 Sun Ultra Enterprise 6000 c/s (2 node x 22 way) 51,871.62 $134.46 Oracle8 Enterprise Edit’n 8.0.3 HP 9000 V2200 (16 way) 39,469.47 $94.18 Sybase ASE 11.5 EBF 7532 We can see here that the IBM/Oracle combination has the highest overall performance but the HP V2250/Sybase combination has the lowest costs per transaction. 3.4.2.2 TPC-D TPC-D is the Transaction Processing Council’s benchmark for Decision Support Systems. It consists of a suite of business oriented ad-hod queries and concurrent updates. TPC-D has more metrics than TPC-C and the size of the database is an extra parameter. The metrics of TPC-D are: the Power metric (QppD@Size): indicating the query processing power at the selected database size. the Throughput metric (QthD@Size): indicating the throughput at the selected database size. the composite Query-per-hour metric (QphD@Size): combining the two previous metrics. 1999 EURESCOM Participants in Project P817-PF page 19 (48) Volume 1: Main Report Deliverable 1 the Costs/Performance metric ($/QphD@Size): giving the costs per QphD for the selected database size. As an example, the following table shows some results of TPC-D metrics for a one terabyte database (representing the situation at the beginning of 1998). Platform QppD QthD $/QphD DBMS Sun Ultra Enterprise 6000 (4 x 24-way nodes) 12,931.9 5,850.3 $1,353 INFORMIX Dynamic Svr AD/XP 8.21 NCR WorldMark 5150 (32 x 4-way nodes) 12,149.2 3,912.3 $2,103 Teradata V2R2.1 Sun Ultra Enterprise 10000 (64 way) 8,870.6 3,612.1 $1,508 Oracle8 v8.0.4.2 IBM RS/6000 SP Model 309 (32 x 8-way nodes) 7,633.0 5,155.4 $2,095 DB2 UDB for AIX, V5 We can see here that the Sun Ultra 6000/Informix combination is superior in all metrics. 3.4.3 Analysis At the moment, the TPC benchmarks are the most general accepted database benchmarks. The TPC-C is used for OLTP application whereas TPC-D is used for Decision Support/data warehouse applications. But as benchmark systems are optimised for performing the (predefined) benchmark queries, the resulting figures are not representative for real life applications. For this reason, the benchmark figures are indicative and should only be used to get an idea of the key players in the database area, their performance and price potential. Finally, by comparing the TPC results over time, one can analyse trends like e.g. the relation between Windows NT platforms and Unix platforms and the increasing size of the databases for TPC-D benchmarks. In the end phase of the construction of this Deliverable the TPC-W benchmark has been announced. This benchmark represents web environments with transaction components, Web page consistency and dynamic Web page generation. The benchmark will be based on a browsing/shopping/buying application. The primary metric will be user Web interactions per second (WIPS) and a price per WIPS. 3.5 Performability modelling Although performability modelling is not directly related to state-of-the-art of telecommunication applications and database technologies, we think it is a good idea to put this section in the document. As mentioned earlier, scaleability and performance are important characteristics of a very large database. To examine these characteristics, experiments will be carried out. But most of the systems used for these experiments will not be at the top-end of the very-large-database-scale. To be able to say something sensible about systems that are at the top-end of that scale, performability models will be build to extrapolate information obtained from the page 20 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report experiments. This section provides some information on the concept of performability modelling. For more information on this subject, the reader is referred to [1]. 3.5.1 Performability When looking at the performance of (database) systems, possible failures of that system are often not taken into account. This may be easier to model, but it is not correct. From a users point of view, performance of a system is the performance of a system subject to failure and distortion. Modelling the performance of a system, taking into account possible failures of that system, is called “performability modelling”. Performability modelling can be done in two ways: an integrated approach: where one overall model of the system, containing all possible events (e.g. arriving of jobs, server breakdown etc.), is created and solved. an approach based on behavioural decomposition: where only likely states and associated probabilities of the system are modelled resulting in a smaller state space size. As the second approach gives satisfying results within a smaller amount of time, this approach will be used in the P817 Project. 3.5.2 Tools for performability modelling Doing performability modelling is not just a manual job. Several tools are available to support the modeller in making models and performing analysis and simulations. We have distinguished the following five categories: Model specification and construction tools: for (graphically) creating models of a system Analytical model solving tools: for solving a created model in an analytical way. Simulation tools: for solving a created model by means of simulation (often used when the model is to complex to solve in an analytical way). Performance monitoring tools: for obtaining performance measurements from the system that is modelled. These measures are used to tune the model. Load emulation tools: for generating workload for the system that is modelled. A selection of actual software products to be used for the experiments, has not been made yet. 3.5.3 Guidelines for measures for the experiments When performing experiments, it is important to realise that there are two types of experiments; the experiments with the database itself (the database experiments) and experiments with the model of that database (the modelling experiments). Results from both types of experiments should be comparable, although they are obtained differently. With the database experiments the results are obtained by measuring, using performance monitoring tools, at the real system. For the modelling 1999 EURESCOM Participants in Project P817-PF page 21 (48) Volume 1: Main Report Deliverable 1 experiments, results are obtained by evaluating the model using analytical or simulation techniques. In [1], a framework is provided that enables comparison of results from both types of experiments. This framework provides both a layered approach for measuring relevant parameters at different layers (Application, DBMS, Operating System, Hardware) and a definition of the types and units of parameters to use. 3.5.4 Analysis Performability modelling is not a goal on its own. It is used to give a better understanding of the performance of a, not always failure free, very large database, before it actually is build. After a first version of a performability model has been created, experiments with a small size database are necessary to improve the model in an iterative way. It is important to define the experiments in such a way that the results can serve as input for the performability model of the database. Both tools for monitoring the database as tools for making and analysing the model are necessary. 3.6 Data warehousing This section describes data warehousing concepts, products and market developments. For more detailed information, the reader is referred to [3]. 3.6.1 Data warehouse architectures The term data warehouse is defined as ”a subject oriented, integrated, time varying, non-volatile collection of data that is primarily used in organisational decision making”. As data is collected from several sources and added to the already existing data in the data warehouse (viz. historical data is stored), the potential size of a data warehouse is enormous. The architecture chosen for a data warehouse has great impact on how the data warehouse is built. The main components of a data warehousing architecture are: source databases, data extraction, transformation and loading tools, data modelling tools (including import and export facilities), target databases and end-user data access and analysis tools. The main architectures for building a data warehouse include: Virtual Datawarehouse; the end-user tools operates directly on the source databases, that is there are no target database holding the physical warehouse data. The main advantages of this architecture is that it is easy to build, does not require large investments. The main drawbacks include that no history can be stored, the queries interfere with the operational processes and the source systems must have on-line access (e.g. RDBMS); Retail Data No Target DB Financial Data Legacy Systems End User Tools External source DB page 22 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Data Mart in a Box; all end-user tools operate on a single physical warehouse target database. The data from the source databases are extracted, cleaned, scrubbed and integrated into a physical warehouse target database. Local meta data are stored in the target database. The main advantages of the architecture is that it is a typical architecture directly supported by packaged products and hence provides an easy entry to data warehouse technology usage. This approach can be dangerous because it does not include meta-data integration which may result in a stove pipe data mart, i.e. a standalone data mart not possible to integrate with an enterprise data warehouse (see the Multiple, Non-Integrated Data Marts architecture); Retail Data Target DataBase Financial Data RDBMS Legacy Systems End User Tools External source DB Multiple, Non-Integrated Data Marts; the data from the source databases are extracted, cleaned, scrubbed and integrated into physical data mart target databases (one for each department) by multiple extraction processes. The enduser tools operate on the separate physical warehouse data marts, but the data marts are still not integrated together. The advantage of this approach is its rapid deployment, but a severe drawback is the increasing complexity of the architecture as the data warehouse evolves. This will result in huge maintenance cost. Another disadvantage is that data marts may not be consistent with each other (i.e. multiple, incompatible views of the truth); Sales Data Mart Retail Data RDBMS Financial Data Financial Data Mart Legacy Systems External source DB MDB RDBMS Human Resources Data Mart Human Resources DB Source Databases Data Extraction, Transformation Load Architected Data Marts Data Access and Analysis Multiple Architected Data Marts; the integration problem is solved as the multiple architected data marts share common meta-data, but have no central data warehouse in common. Sharing the same meta-data implies that the data marts are built in the same way but serve different business areas. The main advantage of this approach is that the central meta-data repository ensures a consistent view on the data for all data marts. The main disadvantage is that different products must be capable to integrate with the central meta-data repository. 1999 EURESCOM Participants in Project P817-PF page 23 (48) Volume 1: Main Report Deliverable 1 Data Cleansing Tools Retail Data Extract, Transform and Load RDBMS Financial Data Central Meta Data Legacy Systems Data Modelling Tool External source DB Source DataBases Local Meta Data MDB Data Extraction, Transformation, Load Architected Data Marts Data Access and Analysis Enterprise DWH architecture; includes a large data warehouse driving multiple data marts. There are multiple source databases. The central data warehouse stores detail data and supports organisation wide, consolidated analysis /reporting. The architecture also includes multiple architected data marts based on RDBMS and/or MDB. There is central co-ordination and management based on access to central meta data repository. This architecture involves a complex environment which means high development cost and risk. This kind of architecture is though required when end-users need access to detailed data. The main advantages of this approach include the availability of detail data and support for organisation wide consolidated analysis/reporting. The main disadvantage of this approach is the complexity of the environment and its high development cost and risk. Data Cleansing Tools Warehouse Admin. Tools RDBMS Legacy Relational Extract, Transform and Load Load Text Data Subsetting & Distribution Meta Data Integration Central Meta Data Local Meta Data External MDB Data Modelling Tool ROLAP Calc. Engine Source DataBases Data Extraction &Transformation Central Data Architected Data Access Data Warehouse Distribution Datamarts and Analysis Operational Data Store Feeding Data Warehouse; an Operational Data Store (ODS) consolidates data from multiple transaction systems and provides a near real-time, integrated view of volatile (i.e. changeable, not permanent), current data. The ODS is used for operational decisions and the data warehouse is used by business analysts for tactical/strategic decisions. The ODS can be implemented with package software. The ODS may also be used as a staging area to drive one or more data warehouses (i.e. the ODS is put between the Source DB and the DW as a staging area with a pull or push process that fetch the data from the source DB, this staging area can then feed more than one DW). This is the most complex data warehouse environment which results in very high cost and risk. The other parts of the architecture is described in previous sections. The ODS adds the value of presenting a current, near real-time, integrated view of volatile, current data for the end-user, which the data warehouse not is able to. The main advantage of this approach is that the ODS presents a current and integrated view on enterprise data. Hence it can be used for complex operational support. The main disadvantage is its high cost and high development risk. page 24 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Data Cleansing Tools Warehouse Admin. Tools Operational Data Store RDBMS Legacy Extract, Transform and Relational Text External Data Subsetting & Distribution Load Meta Data Respository Central Data Warehouse MDB Data Modelling Tool ROLAP Calc. Engine Source DataBases Data Extraction & ODS and Central Data Multipel Transformation Data Warehouse Distribution Datamarts 3.6.2 Front-End Data Access Design Strategies There are two main philosophies of how to build a data warehouse, the first is building it top-down, the second approach is to build the data warehouse bottom-up. The top-down approach means that you from start of the data warehouse Project include all source data and design a data warehouse with full size which is capable of handling all your source data and end-users from start of operation. The bottom -up approach is to design a data warehouse for a very small isolated part of your business i.e. a data mart and from there when that is working extend it more and more to at last be able to handle all your source data and end-users. The way to be build a data warehouse successfully is some what in between these two previous strategies. Start with a small isolated part of the business but define from start all datamodels, data semantics and definitions across business areas as well as meta data handling for the whole enterprise data warehouse. 3.6.2.1 Data Models The data model provided to the user must meet his specific needs. Typically one sees in data marts multi-dimensional modelling where the dimensions reflect the interest of the user (i.e. products, sales, period). Star-schemes and snow-flake schemes are widely used for this because it approaches the user’s way of thinking and because of performance reasons. In the central data warehouse consistency is more important and hence a normalised data model is often used here. 3.6.2.2 Meta Data Meta data includes schema information of the source data, central data warehouse and target data, calculation functions for derived data, transformation and conversion rules and batch processing information. Meta data is stored in a central meta data repository and may be distributed to local meta data repositories. Centralisation of meta data maintenance ensures consistency. Meta data standards are characterised by incompatible standards from different organisations. The Meta Data Coalition with the members IBM, Informix, SyBase, ETI, Business Objects, Arbor Software and Cognos has produced the Meta Data Interchange Specification (MDIS), but this standard only specifies a flat file format for exchanging information about data - this effort has not really been successful. There exist a few other standards but the most promising one is the MicroSoft Open Information Model (OIM). The fact is that over 65 vendors have agreed to back OIM 1999 EURESCOM Participants in Project P817-PF page 25 (48) Volume 1: Main Report Deliverable 1 as a meta data standard. Meta Data Coalition has pledged to build translator between the Meta Data Interchange Specification (MDIS) and OIM. 3.6.3 Data Cleansing, Extraction, Transformation, Load Tools Moving data from the operational databases to the data warehouse needs to be done via extraction tools. The following functions are essential: Extraction and Transport; The data extraction component provides the ability to transparently access all types of (legacy) databases regardless of their location. Data extractors also provide a means of transport from the source data platform to the target. Data modelling and transformations; Data models, or metadata, define data structures and may incorporate rules for business processes. Transformations of data are needed because a data warehouse is organised differently from a traditional transaction system. Data cleaning and conversion is also needed because much of the source data comes from legacy systems which may contain missing or overlapping data. Loading; most relational database vendors offer bulk load utilities that provide a high-speed way of loading large volumes of data by disabling index creation, logging, and other operations that slow down the process. Data Cleansing; denotes the process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse. In the search for "best practices" many organisations try for a quick fix only to discover that the data cleansing issue is enormously complex and requires industrial strength solutions. There is no one tool package that addresses that extremely large number of tasks pertaining to data extraction, cleaning, and transport. Different tools specialise in addressing different issues. W.H. Inmon, in his books on data warehousing, estimates that, on average, 80% of the efforts of building a data warehouse go into these tasks. The following tools facilitate the construction of data warehouses: Enterprise/Integrator (Carleton), Data Propagator (IBM), PowerMart (Informatica), DECISIVE (InfoSAGE), IRE Marketing (Mercantile Software Systems), Info Transport (WarehousePlatinum Technology), OmniLoader (Fast Load Praxis International), SAS/WarehouseAdministrator Software (SAS Institute), STRATEGY Distributor (ShowCase), Smart DB Workbench (Smart Corporation), DataImport (Spalding Software), ORCHESTRATE Development Environment (Torrent Systems), Warehouse Directory (Prism Solutions). There are only a few vendors that offers a total end-to-end data warehousing solution with own products. Most vendors depend on one or more third party components for solving the end-to-end data warehouse solution. Two examples of companies that provides an end-to-end data warehouse solution are SAS Institute Inc. (SAS Data warehouse - Orlando II) and Information Builders. 3.6.4 Target Databases Target databases include the databases where data from the source databases is transferred to and stored. Typically it holds historic, non-volatile data (read-only). page 26 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report The data in the target database are also subject oriented, integrated and includes both detail and summary data. Three types of target databases are used: 3.6.5 relational databases (RDB); Conventional RDBs in combination with relational OLAP (ROLAP) tools, support most data warehousing requirements, and can handle very large target databases. The central data warehouse is almost always a conventional RDB. The main advantages include excellent scaleability (Terabytes), mature technology, openness and extensive support by the industry. Multi-dimensional databases (MDB); contain pre-calculated results from the base data in order to provide fast response times to the user. The main advantage of MDBs are high performance together with sophisticated multi-dimensional calculations. The most important limitations of an MDB is its scaleability in terms of size (max. 30 GB) and number dimensions and its inflexibility. Hybrid databases use the relational component to support large databases for storage of detailed data and ad-hoc data access, and the multidimensional component for fast, read/write OLAP analysis and calculations. The combination of RDBMS and MDB is controlled by an OLAP server. The result set from the RDBMS may be processed on-the-fly in the Server. On-Line Analytical Processing (OLAP) Technology and Tools On-Line Analytical Processing (OLAP) is defined as the process of slicing the data using multi-/ cross dimensional calculation. OLAP provides capabilities for consolidate/summarise along different complex hierarchies on dimensions, that involves grouping/classification and summarisation/aggregation for both business and statistical analyses. The following types of OLAP exist: Relational OLAP (ROLAP) is based on relational technology and uses RDBMS tables as data source for the analyses. The main advantage of ROLAP is its RDBMS base with its scaleability characteristic. However, ROLAP requires online calculation which may have severe impact on response times. Products include: DSS Agent/Server (MicroStrategy), DecisionSuite (Information Advantage), InfoBeacon (Platinum technology), MetaCube (Informix Software). Multi-dimensional OLAP (MOLAP) is based on that all answers (aggregates) to the questions/queries made to the system are pre-calculated and stored in a multidimensional database or data cube before the end-user starts to interact with the system. The MOLAP approach requires a dedicated data structure and the pre-calculation of all possible aggregates of the dimensions. The advantage of MOLAP is its instant response, but MOLAP requires extensive precalculation. Products include: Oracle Express for OLAP (Oracle Inc.), EssBase and IBM DB2 OLAP Server (Arbor Software Corp.), TM1 (APPLIX Inc.), Holos (Seagate Software), SAS Multi-dimensional Database Server (MDDB), GentiaDB (Gentia Software). Hybrid OLAP (HOLAP) is a combination of using relational OLAP and precalculated aggregates stored in multidimensional structures (MOLAP). Using this hybrid approach for solving the analyses is maybe the most complete solution for providing decision support to end-users with different levels of requirements on the information. Products include: Media MR (Speedware Corp.), Plato OLAP Server (Microsoft). 1999 EURESCOM Participants in Project P817-PF page 27 (48) Volume 1: Main Report Deliverable 1 Desktop OLAP (DOLAP) is ROLAP, MOLAP and/or HOLAP technology implemented and operated on a desktop environment. Products include: Brio Enterprise (Brio Technology Inc.), Business Objects (Business Objects), Impromtur/PowerPlay (Cognos), IQ/Vision (IQ Software Corp.). The Microsoft OLAP Server - Plato which is integrated with MS SQL Server 7.0 will probably have great impact on the market for OLAP tools when this will be released during second half of 1998. The MS OLAP server is based on HOLAP i.e. a hybrid solution with both a MDB for MOLAP based analyses and a relational database for ROLAP based analyses. 3.6.6 Data Mining Data mining technology allows organisations to leverage their existing investments in data storage and data acquisition. Through the effective use of data mining technologies, organisations discover actionable information from raw data. Production data mining systems automatically identify and act upon this information. Production data mining has three basic requirements. First, it needs to operate on large volumes of data - often hundreds of gigabytes. Second, the data mining system needs to be able to handle very high throughput within fixed time constraints. Tomorrow’s inventory forecast is useless if it is not generated in time to take action. Finally, production data mining systems must be able to efficiently process thousands of models, each using thousands of variables. Data mining tools examine the historical detail transactions to identify trends, to establish and reveal hidden relationships, and to predict future behaviour. The tools are available in the following categories: case-based reasoning, data visualisation, fuzzy query and analysis, knowledge discovery and neural networks. 3.6.7 Data Warehousing on the Web The explosive growth of the World Wide Web and its powerful new computing paradigm offer a compelling client/server platform for corporate developers and DSS architects. Two key advantages motivate the integration of world wide web and data warehouse technology: 1. expand role of data warehouses in the organisation; 2. extend the use of data warehouse cost effectively. To disclose data warehouses via the Web, a multi-tiered architecture is essential as it greatly improves response times of accessing data. Java computing brings interactivity to the web client and reduces cost because no fat clients are necessary. Server solutions must meet stringent performance, scaleability, and availability requirements while providing comprehensive services for security and simplified management. Many companies are providing effective, so called Webhousing solutions for the Internet, intranets and the WWW: Active OLAP Suite (ACG), GQL (Andyne Computing, Ltd.), Essbase (Arbor Software), brio.web.warehouse (Brio Technology), WebIntelligence (Business Objects), DecisionWeb (Comshare Commander), CorVu Web Server (CorVu), NetMirror/400 (DataMirror), WebPlan (Enterprise Planning Systems), Net.Data (IBM Corporation), Aperio OLAP, Aperio Report Gallery (Influence Software), WebOLAP (Information Advantage), WebFocus (Information Builders), Webcharts, WebSeQueL, Web Warehouse (InfoSpace), ALICE d'ISoft (Isoft), IQ/LiveWeb (IQ Software), Business WEB (Management Science Associates), DSS Web (Microstrategy), Shadow Direct, Shadow Web Server (Neon Systems), DataDriller (OpenAir Software), Designer/2000, Discoverer v3.0, Express page 28 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Web Agent (Oracle Corporation ), Internet Publisher (Pilot Software ), Beacon, Forest and Trees, InfoReports, DataShopper (Platinum Technology ), Warehouse Directory (Prism Solutions ), VISTA PLUS (Quest Software ), Data Mart Solutions (Sagent Technology), Web Enabled Tools (SAS Institute ), DBConnect for the Web (Silvon ), Media/Web (Speedware ), Environment Manager Toolset (White Light Technology, Ltd. ), Zanza Web Reports (Zanza). 3.6.8 Analysis The market of data warehousing is growing - DW software, hardware and services will continue to grow at 40 % compound rate through 1998 - from 2 billion US-dollar in 1996 to 8 billion US-dollar in 1998. The number of organisations building data warehouse of size 1 TeraByte or larger will increase from 7 % to 17 % during 1998/1999. There is a trend of more and more usage of data marts as a starting point for data warehouse usage, which also enables to build the data warehouse incrementally. Tools for all different parts of the data warehouse architecture as well as for data warehouse administration are being developed. The including of Operational Data Stores in the data warehouse architecture adds the value of presenting a current view of data for operational decisions, which is not provided by the data warehouse. The lack of Meta Data standard may be solved by the de-facto standard introduced by Microsoft (Open Information Model) which may help to automate the meta data handling which at this moment is performed manually in 90 % of all cases. OLAP servers that provide multi-cube facilities, i.e. the hybrid approach that supports both relational and multi-dimensional OLAP, tend to be the most allaround solution for meeting different levels of end-users requirements. Few commercial products support the important integration of local meta data with a central meta data repository for the whole data warehouse. Further only a few companies provide an end-to-end data warehouse solution. Most vendors depend on a third party vendor for providing a total data warehouse solution. A Web interface on the end-user access products is almost a rule. Dominance of Microsoft SQL Server 7.0 to build data marts is foreseen. The following recommendations can be given: Build the data warehouse incrementally, one business area at the time, but define the structure for the whole data warehouse and architected data marts from the start. Buy only components that integrate with central meta data repository and ensure that the data warehouse is not populated with dirty data. Support a mix of RDBMS, MDB, and hybrid target database and ensure that the tools can provide the same functions on a LAN as on the World Wide Web. Support mobile users with off-line query, reporting and OLAP functions. Ensure the system is scaleable to increases in users and database size as well as provides powerful security and warehouse administration functions. 3.7 Transaction processing Transactions are fundamental in all software applications, especially in distributed database applications. They provide a basic model of success or failure by ensuring that a unit of work must be completed in its entirety. For more detailed information on this subject, the reader is referred to [2]. 1999 EURESCOM Participants in Project P817-PF page 29 (48) Volume 1: Main Report Deliverable 1 From a technical point of view we define a transaction as "a collection of actions that is governed by the ACID-properties". The ACID properties describe the key features of transactions: Atomicity. Either all changes to the state happen or none does. This includes changes to databases, message queues or all other actions under transaction control. Consistency. The transaction as a whole is a correct transformation of the state. The actions undertaken do not violate any of the integrity constraints associated with the state. Isolation. Each transaction runs as though there are no concurrent transactions. Durability. The effects of a committed transaction survive failures. Database and TP systems both provide these ACID properties. They use locks, logs, multiversions, two-phase-commit, on-line dumps, and other techniques to provide this simple failure model. The two-phase commit protocol is currently the accepted standard protocol to achieve the ACID properties in a distributed transaction environment. Transaction processing in a distributed environment is supported by TP Monitors. The job of the TP Monitor is to ensure the ACID properties even in a distributed resource environment while maintaining a high transaction throughput. A TP Monitor is good at efficient transaction and process management. Database TP Monitor 1000 Clients 100 Server Classes 50 50 shared connections + 50 Processes + 25 MB of RAM + 500 open files Efficient transaction and process management is achieved by sharing server resources. Keeping connections to the shared application (e.g. database) for each client is very expensive. Instead, the TP Monitor maintains pools of pre-started application processes or threads (called server classes) which are shared by multiple clients. In addition the TP Monitor provides dynamic load balancing. These functionalities provide better scaleability than the traditional two-tier architectures. Database Resources Without TP Monitor With TP Monitor Number of clients In addition to efficiency, a TP Monitor also provides robustness. In case an application crashes, the TP Monitor can re-establish connections or even restart the failed application or process. Also, requests can be redirected to other servers. Hence fail-over functionality is provided. TP Monitors also provide for deadlock detection and resolution. TP Monitors provide facilities for security and centralised management of the distributed applications. page 30 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report A TP monitor should not be used when only few users have access to a (potentially very large) database, because it introduces architectural complexity. Once a TP Monitor is implemented, it is not easy to switch to another TP Monitor. This makes you much dependent on a specific vendor. The Standish Group recommends the use of TP Monitors for any client/server application that has more than 100 clients, processes more than five TPC-C type transactions per minute, uses three or more physical servers and/or uses two or more databases. Before we discuss the commercial TP Monitor products, we identify the components of a TP Monitor. The Open Group's Distributed Transaction Processing Model (1994), which has achieved wide acceptance in the industry, defines the following components: The application program contains the business logic. It defines the transaction boundaries through calls it makes to the transaction manager. It controls the operations performed against the data through calls to the resource managers. Application Program (AP) RM API TX API XATMI TxRPC CPI-C Transaction Manager (TM) Resource Manager (RM) XA API 3.7.1 XA+ API Communication Resource Managers (CRM) Resource managers are components that provide ACID access to shared resources like databases, file systems, message queuing systems, application components and remote TP Monitors. The transaction manager creates transactions, assigns transaction identifiers to them, monitors their progress and coordinates their outcome. The Communication Resource Manager controls the communications between distributed applications. Commercial TP Monitors In general all of the products follow the OpenGroup standard DTP architecture. The following products are investigated: BEA Systems Inc.'s Tuxedo, IBM's TXSeries (Transarc's Encina), IBM's CICS, Microsoft Transaction Server MTS, NCR TOP END, Itautec's Grip. In the following sections, we shortly discuss when to use these products. For a detailed discussion we refer to [2]. 1999 EURESCOM Participants in Project P817-PF page 31 (48) Volume 1: Main Report 3.7.1.1 3.7.1.2 3.7.1.3 3.7.1.4 Deliverable 1 BEA Systems Inc.'s Tuxedo You are developing object-based applications. Tuxedo works with non-object based applications, but it is especially suited to object based ones. In fact, you cannot implement object-based applications involving transactions without under-pinning them with a distributed TP Monitor like Tuxedo. (ORBs without TP underpinning are not secure.) You have a large number of proprietary and other platforms which you need to integrate. You use Oracle, DB2/6000, Microsoft SQL Server, Gresham ISAMXA, Informix DBMSs or MQSeries, and you need to build transaction processing systems that update all these DBMSs/resource managers concurrently. You want to integrate Internet/Intranet based applications with in-house applications for commercial transactions. IBM's TXSeries (Transarc's Encina) Your programmers are familiar with C, C++ or Corba OTS You use Oracle, DB2/6000, MS SQL Server, Sybase, CA-Ingres, Informix, ISAM-XA, MQSeries and/or any LU6.2-based mainframe transaction and you need to build transaction processing systems that update or inter-operate with all these. You need to build applications that enable users to perform transactions or access files over the Internet. IBM's CICS You are already a big user of CICS and intend to be, in the future, a user of predominantly IBM machines and operating systems. Note that if you intend to use non-IBM operating systems you must be prepared to use DCE. You are attracted by IBM's service and support network. You are prepared to dedicate staff to performing all the administrative tasks needed to ensure CICS is set up correctly and performs well across the company. You do not need an enterprise-wide, self-administering solution supporting a large range of different vendors' machines. Microsoft Transaction Server MTS You are building an multi-tier (especially Internet) application based on Microsoft Backend Server Suites. Your system architecture is built upon the DCOM architecture. Your developers have good Visual Basic, VisualJ++ and NT knowledge. You are building a complex business application which stays in the Microsoft world only. page 32 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 3.7.1.5 3.7.1.6 3.7.2 Volume 1: Main Report NCR TOP END You need a strategic, high performance, high availability middleware product that combines support for synchronous and asynchronous processing with message queuing, to enable you to support your entire enterprise. You use TCP/IP, Unix (AIX, HP-UX, Solaris, Dynix, Sinix, IRIX, NCR SvR4, U6000 SvR4, Olivetti SvR4, SCO UnixWare, Digital Unix, Pyramid DC/OSx), Windows (NT, 95 or 3), OS/2, MVS, AS/400 or TPF. You need distributed transaction processing support for Oracle, Informix, Sybase, Teradata, CA-Ingres, Gresham's ISAM-XA, Microsoft SQL Server or DB2/6000. Your programmers use C, Cobol or C++, Oracle Developer/2000, NatStar, Natural, PowerBuilder, Informix 4GL, SuperNova, Visual Basic, Visual C++ (or any other ActiveX compliant tool), Java and Web browsers. Itautec's Grip You need a TPM which is capable of supporting a cost-effective, stand-alone or locally distributed application, which may exchange data with a central mainframe. You want to develop these applications on Windows NT or NetWare servers. Your hardware and network configurations are relatively stable. The DBMSs you intend to use are Oracle, Sybase, SQL Server, Btrieve or Faircom. Analysis The most mature products are Tuxedo, Encina, TOP END and CICS. Grip and MTS lack some features and standards support. If you are looking for enterprise-wide capacity, consider TOP END and Tuxedo. If your project is medium sized, consider Encina as well. If you look for a product to support a vast number of different platforms then Tuxedo may be the product to choose. If DCE is already used as underlying middleware then Encina should be considered. MTS and Grip are low-cost solutions. If cost is not an issue then consider Tuxedo, TOP END and Encina. Internet integration is best for MTS, Encina, Tuxedo and TOP END. Regarding support of objects or components MTS is clearly leading the field with a tight integration of transaction concepts into the COM component model. Tuxedo and Encina will support the competing CORBA object model from the OMG. There seems to be a consolidation on the market for TP Monitors. On the one hand Microsoft has discovered the TP Monitor market and will certainly gain a big portion of the NT server market. On the other side the former TP Monitor competitors are merging which leaves only IBM (CICS and Encina) and BEA Systems (Tuxedo) and NCR (TOP END) as the old ones. 1999 EURESCOM Participants in Project P817-PF page 33 (48) Volume 1: Main Report Deliverable 1 The future will heavily depend on the market decision about object and component models such as DCOM, CORBA and JavaBeans and the easy access to integrated development tools. 3.8 Multimedia databases In this report we adopt the following definition of a multimedia database [3]: A multimedia database (MMDB) is a high-capacity/high-performance database system (including its management system) that supports multimedia data types, as well as other basic alphanumeric types, and handles very large volumes of (multimedia) information. This definition can be divided into five objectives. A multimedia database system: 1. supports multimedia data types. By supporting data types this means that a data type consists of a structure to which specific operations can be invoked. Examples of multimedia data types are text/documents, pictures, audio, video, graphics and classical data. The multimedia data types, audio and video require specific operations like fast-forward, rewind or pause. 2. have capacity to handle a very large number of multimedia objects. The objects are mostly large (e.g. audio and video) and multimedia libraries may contain a huge number of objects. 3. supports high-performance, high-capacity, cost-effective storage management. High-performance storage is required for real-time responses (e.g. stream processing) and handling huge objects. High-capacity is required because the objects can bePrinter very large. BecauseClient some objects have stricter requirements multimedia Fax than other objects (e.g. less frequently accessed, read-only), cost-effective workstations/PCs Station Station storage management may be achieved by a hierarchical storage manager. In a hierarchical storage manager data is spread over on-line, near-line and off-line storage devices, depending on the requirements (mostly access frequency). File Server Content Database Hierarchical Jukebox Retrieval Server Storage Module/ Manager Server Multimedia Database Management 4. has conventional database capabilities. These include persistence, versioning, transactions, concurrence control, recovery, querying, integrity, security and performance. Transactions, concurrence control and querying will be discussed in more detail hereafter. 5. has information-retrieval capabilities of multimedia data. Multimedia data is typically not queried on exact match, but searched for information (e.g. pattern matching, query by example) which requires probabilistic techniques. Querying is therefore often an iterative process. page 34 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 3.8.1 Volume 1: Main Report Querying and content retrieval in MMDBs With the increased complexity of the data objects in a MMDB, exact matches on MM objects are rare. Merely queries on MMDB concentrate on information (i.e. content) of the object, rather than the data itself. For example, pictures can be queried for round red shapes, instead of counting the number of red pixels. Therefore the querying in MMDBs incorporate fuzzy values and answers with degrees of probability. This makes querying an iterative process, rather than a single step. Typical in querying MMDBs is ‘query by example’ (to find similar objects as the presented object). Content based retrieval is very complex when it is based only on the base data of the MM object. Therefore MM objects can be annotated with data to enable quick and accurate retrieval later on. Annotation can be done manually, automatically or a combination of both. Indexing MM objects is very important, because it provides high performance access to the large objects. Three types of indexes are widely used in RDBMSs and OODBMSs: single-key index structures multi-key index structures content-based index structures. A single-key index enables fast access to objects based on a single attribute of the object. An example is the primary key. Single-key indexes structures can be defined on large MM objects, which are stored as Binary Large Objects (BLOB). An interesting single-key index structure is the positional B+ tree structure, where the BLOB is partitioned into equally sized blocks which can be accessed through a tree structure. Multi-key index structures provide fast access to objects which involve multiple attributes. These attributes are scanned at the same time. Especially for MM objects multi-key indexes are important, because in a search typically multiple attributes must be checked. There exists a number of multidimensional index structures, e.g. Kd-tree, multi-dimensional trees, grid files, point-quad trees and R-trees. Content-based index structures concentrate on the content rather than on attributes describing the object. Two important types for content-based indexing are: inverted indexes and signature indexes. An inverted index is a list of pairs (value, set) where the set includes all relevant objects associated with the value. Signature indexes associate each object with a signature. The signature is a complex string which encodes information about the object. Hence to identify relevant objects, only their signatures must be scanned. Although signature indexes are much more efficient in storage, they require more complex algorithms. 3.8.2 Transactions, concurrence and versioning in MMDBs Typically an MMDB consists of multiple components (e.g. RDBMS, hierarchical storage manager, full text retrieval engine), so transactions, concurrence control and versioning are complex tasks in MMDBs. Next these three topics will be discussed. 1999 EURESCOM Participants in Project P817-PF page 35 (48) Volume 1: Main Report 3.8.2.1 Deliverable 1 Transactions Traditional applications often involve short transactions (such as debit/credit transactions) while multimedia database applications involve long (and short) transactions. Long transactions are particular true in graphics and computer-aided design applications. There have been a number of transaction management techniques developed especially for object-oriented and multimedia database application involving long duration of transactions. One of the earliest transaction models for handling this was the nested transaction model. 3.8.2.2 Concurrence Concurrence control is the activity of synchronising the actions of transactions occurring at the same time, thereby prohibiting mutual interference. There exist a number of concurrence control algorithms such as locking, time-stamp ordering, commit-time certification etc. The most widely used algorithms in multimedia database implementations (as well as in commercial database implementations) are the locking-based algorithms. The concept with locking is that each persistent object has a lock associated with it. Before a transaction can access an object, it must request a lock. If at least one other transaction is currently holding that lock, the transaction has to wait until the lock is released. The database system can provide locking at different granules (e.g. at the instance, record, compound object, class, page, storage area or database level). Multimedia databases have several categories for granules, i.e. physical storage organisation, classes and instances for object orientation, class hierarchy locking, complex or composite object locking. Fine grain locking compromises performance, whereas coarse grain locking compromises concurrence and has increased danger of deadlocks. 3.8.2.3 Versioning In many applications that involve complex objects, references between objects need to be consistently maintained. Some applications use the concept of generations of data based on historic versions. This concept is rather straightforward for multimedia applications supported by multimedia databases. The multimedia storage manager must make sure to propagate the older versions of objects to write-once-read-many (WORM) media and to store the volumes of much older versions off-line. 3.8.3 Multi-media objects in relational databases Relational databases support variable field lengths in records. These data types are supported with the intention of providing direct and easy definition of variable-length types such as text, audio/video (digitised), pictures (black&white, colour). Products include among others Borlands’ Interbase, Sybase SQL server and Plexus XDP database. InterBase is a relational database system from Borland which has built-in support for BLOBs. BLOBs are stored in collections of segments and to access and manipulate the database, InterBase uses GDML which is a proprietary high-level language. The Sybase SQL server supports two variable-length data types, TEXT and IMAGE where each such field can be large (2 GB). The database designer can via an API call place the text (or multimedia data) on a device or volume separate from the database tables. page 36 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report The Plexus XDP database is actually based on the INFORMIX-Turbo RDBMS. Plexus has extended the INFORMIX engine with a number of multimedia database features. It provides support for two variable-length data types, TEXT and BYTES. Here, the BYTES and TEXT data can be stored in magnetic, erasable optic, or WORM devices where the optical drives are managed directly by XDP. Unlike most RDBMSs, the XDP system manages optical jukeboxes and volumes directly (vs. storing an operating system path in a character length field). XDP provides consistent transaction management for both records and long variable-length fields (IMAGE and TEXT). 3.8.4 Multimedia Objects in Object-Oriented Databases There are many ways and approaches for integrating object-oriented and database capabilities, including a number of standardisation efforts (by ODMG and OMG). There are three main approaches that can be outlined that are especially relevant to multimedia database applications: 3.8.5 object-relational databases or databases that extend the relational model with object-oriented capabilities (e.g. UniSQL and Illustra). object-oriented databases that extend or integrate with an object-oriented language supporting persistence and other database capabilities (often C++ is used) (e.g. GemStone DBMS, Versant, Objectstore) application-specific approaches that might use any underlying DBMS but which concentrate on a specific application area. Application-specific examples are face-retrieval systems, satellite imaging, earth and material science, medical imaging, etc. These systems are often used to solve ”one solution per application” and therefore isolated from each other. Analysis Today, most multimedia databases are specific applications developed with commercial DBMSs, HSMs or information-retrieval technologies. The state-of-the-art is to rely on third-party vendors for each component and integrate these together (at least some commercial DBMSs are starting to incorporate optical storage and multimedia server support). Although a number of ”general-purpose” multimedia development tools with various multimedia editing, querying, and retrieval capabilities have started to appear, the successful implementations of multimedia pertain to specific applications. 3.9 Databases and the World Wide Web Nowadays, we can not image a world anymore without the Internet and the World Wide Web (WWW). As the WWW is in fact one very large distributed multimedia information system, very large databases play an important role. This section first gives a small introduction into Internet and WWW concepts before describing the possible ways of embedding databases in the architecture. As security is a major issue with Internet, this is also treated. For more detailed information, the reader is referred to [3] 1999 EURESCOM Participants in Project P817-PF page 37 (48) Volume 1: Main Report 3.9.1 Deliverable 1 The Internet and the World Wide Web In fact, Internet is a huge set of autonomous systems connected in a network. In principle, each system can communicate, directly or indirectly via routers, with all other systems connected to the Internet. Each system can act as a client or as a server. For communication to be possible, unique addressing of all systems is necessary. On Internet, the so called Internet Protocol (IP) address is used for this. Each IP address is made up of twelve digits and is issued by a central Internet authority. Internet is a package based network, and the Internet Protocol itself is not sufficient to enable real communication. A number of other protocols on top of it enable the actual end to end communication (see figure below). For detailed descriptions of all components the reader is referred to [3] but what one can see is that the WWW (or simply the Web) is one of the many applications (indicated by ovals in the figure) on Internet. DNS Web File transfer HTTP UDP FTP E-mail Telnet SMTP TCP IP Hardware - Physical network Important Internet protocols and applications Information on the WWW is mostly structured in documents using the Hyper Text Mark-up Language (HTML). HTML enables the structuring of multimedia information in a uniform way where so called Web Browsers (e.g. from Netscape and Microsoft) can render it to the user. An other important facility of HTML is the possibility to point from one document to another document via so called Hyper Links. By means of these hyper links users can navigate transparently over the whole Internet. The rather static presentation of HTML document based information on Internet, appeared to be insufficient to satisfy the needs of the users. A more dynamic type of generating information on request was needed. This is where databases can play an important role. 3.9.2 Database gateway architectures At the server side, several architectures are possible for connecting databases to the WWW. In the following the most important ones are described. The figure below shows a “Database gateway as a CGI executable program”. The Webserver receives requests from Internet users and for each user a Common Gateway Interface (CGI) process is started. These processes set-up a session with the database, passing eventual parameters. The database then retrieves the data and returns it to the requesting CGI process which in turn returns it to the webserver. The webserver returns the data to the requesting Internet user. page 38 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Web server CGI Database gateway Database gateway DBMS Although the concept is simple, the main drawbacks are the large amount of system resources needed to host a separate CGI process for each users and the relative long start-up time of a database session for each user. To overcome these drawbacks, the “CGI application server architecture” has been invented. In the CGI application server architecture, a Application Server on a middle tier, takes care of efficient set-up of a pool of sessions with the database. For each Internet user request, the Webserver starts a very small CGI process (the dispatcher) to pass parameters to the application server, which in turn chooses an already existing database session from a session pool to answer the request. Data from the database is returned the other way around. CGI dispatcher Web server CGI Application server CGI dispatcher DBMS An even more efficient, but Webserver proprietary, solution is the “Server API architecture”. In this architecture the Webserver comes with an API to extend its functionality. By adding database connection functionality to the Webserver, a very efficient route from Internet user via webserver to the database is made. A drawback from this solution is that errors in the code of the database gateway may corrupt the working of the whole web server. 1999 EURESCOM Participants in Project P817-PF page 39 (48) Volume 1: Main Report Deliverable 1 Web server Extensible server API Database gateway DBMS 3.9.3 Web databases and security In principle, each system on the Internet can connect to any other system connected to the Internet. For this reason, security is a hot item when connecting databases to the Internet. Only authorised users, both inside and outside the own organisation, should be able to get data in or out a database. The two main approaches to provide this are firewalls and encryption techniques. The figure below, gives a typical architecture to connect a database to the Internet in a secure way. Intruder? Internet Firewall Intruder? LAN network Web server DBMS Internal access The firewall, is a gatekeeper examining network data coming in and going out the organisations internal LAN network. There are several types of firewalls that each have their own way of examining network data, but the three major types are: Screening Routers: low network level systems looking inside the IP packets to determine whether a packet may pass depending on, among others, sender address and receiver address. Proxy Server Gateways: operating at a higher level in the protocol stack (e.g. HTTP) to provide more means of monitoring and controlling access to the internal network (e.g. hiding the IP addresses of the internal systems for the outside world thus preventing intruders from directly connecting to an internal system). Stateful Inspection Firewalls: comparing bit patterns of passing packets with the bit patterns of packets that are already known to be trusted. In contrast with page 40 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report proxy server gateways, this type of firewalls does not need to evaluate a lot of information in a lot of packets, resulting in much less overhead. The first type is the most simple one but offers less security than the other two types. The second type is the most common used, but generates a lot of overhead. The third type is upcoming now. Remark that no matter how secure a firewall theoretically can be, a good firewall policy has to be implemented in an organisation to keep intruders out. Employees by-passing the firewall with, for example a private modem connection to the Internet, are a treat to the safety of the internal network Apart from firewalls, encryption techniques can be used to encrypt data send over the Internet, thus preventing intruders from analysing this data and breaking into the internal network. The Secure Socket Layer (SSL) is a open protocol specification for providing general accepted encrypted data transport over the Internet. At the moment, SSL is implemented as part of Netscape’s and Microsoft’s proprietary webserver API’s. 3.9.4 Web database products In principle nothing has to be changed at the database site to connect it to the web, so all existing databases can be connected to the web. From a database point of view, it does not matter whether a users connects via the Internet or via a direct connection. But, as described above, a webserver is often used in the case of an Internet connection. All major database vendors like Oracle, Informix, IBM and Microsoft offer software to connect their databases (and the databases of competitors) to the web. Although possible, one should take into account the special demands posed on the database by the Internet like large amounts of users (scaleability), contents in HTML format (multimedia support), 7x24 use (availability) and a lot of read transactions (data warehousing). Database vendors have tuned their databases to meet these typical Internet needs. 3.9.5 Analysis Although it is still difficult to figure out the profits of the Internet, it is difficult to image a world without it. Together with the increasing demands for (personalised) information via the Web, the role of very large (multimedia) databases increases. All major database vendors and other dedicated Internet vendors, already offer products (e.g. webservers) to connect their databases to the web in a secure and efficient way. Vendors tune their DBMSs to meet the special requirements of Internet use, like handling large numbers of concurrent users, processing multimedia contents and providing high availability. 1999 EURESCOM Participants in Project P817-PF page 41 (48) Volume 1: Main Report page 42 (48) Deliverable 1 1999 EURESCOM Participants in Project P817-PF Deliverable 1 4 Mapping of telecommunication database technologies Volume 1: Main Report applications on As mentioned earlier, this Project focuses on Service Management applications and Service Provisioning applications. In this chapter we list some applications we have in mind when talking about Service Management and Service Provisioning. Second, the mentioned telecommunication applications are mapped onto the database technologies from chapter 3 by means of a matrix. Given a service management or a service provisioning application, this matrix gives an indication of the relevant database technologies. For more detailed information on the realisation of the matrix, the reader is referred to [4]. 4.1 Service management applications The management of telecommunication services is a key issue for PNOs. For handling different services, there are management systems such as billing system, customer ordering system, customer/user management system, etc. These management systems heavy rely on very large database systems. Examples of these applications are: 4.2 Billing: both for customers (customer billing) as for providers (provider billing). Billing is a core process to get money for the services offered to customers. Session management: for monitoring sessions of customers and storing related information (e.g. call detail records (CDR)) Customer registration: for knowing who each customers is and what he wants. So called “Customer Profiles” play an important role in customer personalisation. For each customer all relevant information (e.g. address, call behaviour, installed base etc.) is stored to satisfy the needs of individual customers. So called “1-to-1 (database) marketing” may give a competitive edge for PNOs in the near future. Number portability: to enable customers to keep the same telephone number when they change from one operator to another. Home location registering of mobile phones: for keeping track of which basestation is nearest to which mobile phone. Service Order Entry: for supporting the order processing within the organisation. Enterprise Resource Planning (ERP): for enterprise wide supporting process flow and related data. ERP promises one database, one application and one user interface for the entire enterprise where once disparate systems ruled manufacturing, finance, distribution and sales. Business Intelligence: for analysing customers behaviour and starting appropriate marketing campaigns. Service provisioning applications Applications for supporting Service Provision heavily rely on very large database technology and their number and size is increasing. Whereas Service Mangement 1999 EURESCOM Participants in Project P817-PF page 43 (48) Volume 1: Main Report Deliverable 1 focuses on managing processes within the company, Service Provisioning focuses on offering services to the customers. Examples of these applications are: 4.3 Search Engines: offer the customers facilities for searching in information bases (e.g. the Internet). Electronic Commerce: offers a variety of services for buying and selling goods and services in an electronic way via Internet. Hosting Multimedia: for hosting multimedia information (e.g. audio, video, image, text, etc.) Video On Demand Service: for streaming a video to a customer on his demand. Audio on Demand Service: offers the customer the possibility to download audio-streams. On-line Publishing: for on-line publishing of (multimedia) content on e.g. the Internet. Digital Library: for disclosure of digitalised collections of books, articles and other published media information. Tele-Education: which incorporates all telecom based educational services ranging from one way directed educational material by broadcast to customers, to electronic classrooms displayed with VR technology. Mobile Information Service: for providing the customer with information on restaurants, shops, offers, etc. located closely to the customer that are based on the customers interests and current position. Fleet Management Service: for providing the customer with information of peoples/vehicles position. Computer Supported Co-operative Work (CSCW): supports people, separated by long physical distances, to work together. The service provides the customer with all necessary help for communication and effective work e.g. video/audio conferencing, sharing of documents/white-boards, e-mail facilities, common document database etc. Monitoring related services: for giving key customers the possibility to monitor and partially control their own use of telecom services. The telecommunication applications/database technologies matrix After having defined the relevant telecommunication applications in the previous section, we now continue with mapping these applications onto the database technologies discussed in chapter 3. In the matrix below the degree of support a database technology provides to an application is indicated by the following grades: Required (req.) Applying Technology y to realise the Application x is required, otherwise it is difficult to realise Application x. page 44 (48) 1999 EURESCOM Participants in Project P817-PF Deliverable 1 Volume 1: Main Report Helpful To realise the Application x the usage of Technology y is helpful. Other technologies may offer the same capabilities. Optional This Technology is not directly needed to realise Application x, but the decision to use this technology depends on the expected external influences like expected traffic, number of users, workload, existing communication environment, legacy systems to be integrated etc. None No influence, not applicable. The matrix gives a first impression of which database technologies are relevant for which applications. The matrix is the result of combining detailed information given in [4]. 1999 EURESCOM Participants in Project P817-PF page 45 (48) Volume 1: Main Report Database Technology Deliverable 1 Parallel DB Retrieval and Manipulation TP Monitors Back up & Recovery Performability WEB DB Multimedia DB Datawarehousing Benchmarking ERP optional req. helpful helpful helpful optional none req. helpful Number portability helpful req. helpful req. helpful optional none helpful helpful Session mngm helpful helpful req. req. req. optional req. optional helpful SOE optional helpful req. req. req. helpful none helpful helpful Home Location optional req. req. req. req. optional none helpful helpful Billing optional helpful req. req. req. optional none none helpful Customer Regr optional req. req. req. req. optional none none helpful Decision support req. helpful req. helpful req. req. none req. helpful E-comm. optional req. req. req. req. req. helpful helpful helpful Fleet mngm optional helpful req. req. req. helpful none helpful helpful Teleeducation none helpful helpful req. req. req. helpful helpful req. Mobile information optional helpful req. req. req. optional none helpful helpful Monitoring optional req. req. helpful req. helpful helpful helpful helpful Search Engines optional helpful req. helpful req. req. helpful none req. Digital Libraries req. helpful req. helpful req. optional helpful helpful helpful AoD req. helpful helpful helpful req. optional req. none req. VoD req. helpful helpful helpful req. optional req. none req. Hosting MM req. helpful helpful helpful req. optional req. none req. Online Publ optional helpful helpful helpful req. req. helpful helpful helpful CSCW none helpful helpful helpful req. req. helpful none helpful / Application2 Mapping of telecommunication applications and database technologies 2 Underlined = Service Management page 46 (48) Normal = Service Provisioning 1999 EURESCOM Participants in Project P817-PF Deliverable 1 5 Volume 1: Main Report General analysis and recommendations In this main report we have summarised the developments on the database technologies for the construction of very large databases. Further details can be found in the following parts of the Deliverable. For conclusions on individual technologies we refer to the analysis sections in chapter 3. Here we only recall a few conclusion on those technologies that we feel are currently most relevant. Main drivers for these developments nowadays are low cost database platforms, data warehouses, Web applications, and the issue of data integration. As far as hardware platform support for Very Large Data Bases is concerned, we see the following situation. For very large operational databases, i.e. databases that require heavy updating, mainframe technology (mostly MPP architectures, Massively Parallel Processors) is by far the most dominant technology. For datawarehouses on the other hand, that mostly support retrieval, we see a strong position for the high-end UNIX SMP (Symmetric Multi Processor) architectures. The big question with respect to the future is about the role of Windows NT on Intel. Currently there is no role in very large databases for these technologies, however this may change in the coming years. There are two mainstreams with respect to NT and Intel. On the one hand NUMA (Non Uniform Memory Architecture) with Intel processors, and on the other hand clustered Intel machines. NUMA is more mature and supports major databases like Oracle and Informix. However, NUMA is still based on Unix, but suppliers work on NT implementations. Database technology supporting NT clusters is not really available yet, with the exception of IBM DB2. This area will be closely followed by the Project and actual experiments may be planned to assess this technology. Multimedia databases and Web related database technology is developing very fast. All major database vendors support Web connectivity nowadays. There is a strong focus on database-driven Web-sites and E-commerce servers for the Web. The support for multimedia data support is rather rudimentary. Although vendors like Oracle, Informix and IBM have made a lot of noise on Universal Servers that support multimedia data. The proposed extendible architectures turned out be relatively closed and unstable. Current practice is still mainly handling of multimedia data outside the database. Data warehouse technology is one of the most dynamic areas nowadays. All database vendors and mainframe vendors are in this area. One has to be very careful here, a data warehouse is not simply a large database. There is a lot of additional technology for data extraction, metadata management, and architectures. Of course all major vendors have there own methodology and care has to be taken not to be locked in. A rather new development is that of operational data stores, these are data warehouses with limited update capabilities. Especially for the propagation of these updates back to the originating databases no stable solutions exist. Therefore great care has to be taken when embarking on operational data stores. Finally, as telecommunication services are becoming more and more data intensive, the role of database technology will increase. Therefore, decisions with respect to database technology become crucial elements to maintain control over the data management around those services, and also to maintain a strong, flexible and competitive position. 1999 EURESCOM Participants in Project P817-PF page 47 (48) Volume 1: Main Report Deliverable 1 References [1] EURESCOM P817, Deliverable 1, Volume 2, Annex 1 - Architectural and Performance issues, September 1998 [2] EURESCOM P817, Deliverable 1, Volume 3, Annex 2 - Data Manipulation and Management Issues, September 1998 [3] EURESCOM P817, Deliverable 1, Volume 4, Annex 3 - Advanced Database technologies, September 1998 [4] EURESCOM P817, Deliverable 1, Volume 5, Annex 4 - Database Technologies in Telecommunication Applications, September 1998 page 48 (48) 1999 EURESCOM Participants in Project P817-PF