Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
ContactPoint wikipedia , lookup
DATABASE TRANSFORMATION November 2016 ABSTRACT The continuous IT budget reductions, the high database license and maintenance prices, the upcoming SAP support shortfall for ORACLE, the new hadoop and in-memory database technologies and the increasing requirements for non-structured data and real time big data analysis are pushing the CIOs over all business verticals to undertake a complete transformation of their database architecture. We discuss here how this new architecture could look like and which steps are necessary in order to safely walk this route and master this new challenge. Overview Over the last twenty years ORACLE has managed to remain the #1 database solution vendor on the market. The main reason for this was the great stability and transaction safety of their product. Other factors like the alliances with key application and hardware vendors, their multiple OS support, their good training concept, the large amount of specialist and their aggressive sales and acquisition strategies have also helped to preserve their lead on this kind of technology. During this period of time ORACLE was very careful on the enhancement and upgrade of their core technology. The main features of the core database system remained almost unchanged over the years and most of the new technologies were added as new software layers (i.e. RAC, queuing, partitioning, data guard, spatial, OLAP, In-Memory). This strategy had advantages for the robustness and backwards compatibility and also helped the sales cycle by increasing the price of the database with the addition of independent but very critical features. Thanks to this strategy, the quality of the core software and their good reputation, CIOs over the world did not hesitate to buy the product and remained good customers. An eventual vendor change would have been too risky, no cheaper and reliable alternatives were on the market and the IT budgets were generous. However, during the last five years the situation has changed and CIOs are or should start thinking about a change. We have listed here the reasons for a change, the factors to be considered, a list of key technologies that will help on the way, an example of a state of the art DB architecture, a recommended methodology and a business case example. 1 7 reasons for a change 1 – Database license costs During the last years, server costs were reduced dramatically and their performance was increased even more. The main factor for the cost reduction was the hardware standardisation, but the acceptance of open source operating systems (Linux), open source Web technology (Apache) and open source development tools (i.e. JAVAEE, Eclipse, or GIT) has also contributed to this trend. On the other hand, database license costs did not decrease and are now one of top costs of the IT infrastructure. These costs are even increasing once companies start implementing spatial or inmemory data analytics features on top of the existing database architecture. 2- SAP database and mobile support SAP has communicated that it will stop its ORACLE support until 2025 (Source: SAP Roadmap). Even now SAP HANA is mandatory for fact sheets, analytics apps and most of the rest of the FIORI Apps. If you want to transform your IT according to the new mobile requirements before 2025, you will need to migrate sooner your SAP database to SAP HANA. 3 – Open source maturity The open source movement has led to a number of alternatives to large, complex, and expensive relational database management systems (RDBMSs) for addressing most enterprise data management problems. Open-Source RDBMSs (OSRDBMSs) have matured significantly and can now be used to replace commercial RDBMSs. Gartner already stated on its report in April 2015 “The most demonstrable benefit of OSRDBMs given their increasing suitability from a technology perspective, is the TCO of these products. When skills were at a minimum, management tools were few and the software was relatively immature the TCO was not necessarily lower than those of commercial vendor offerings. That has changed to the point where we now believe that the cost of managing OSDBMSs and the availability of skills are now close to parity with those of the commercial DBMS offerings.” A leading example of this development is EDB Postgres, an open source variant of the pioneering RDBMS project PostgreSQL, which was originally developed by Dr. Michael Stonebraker and his team at the University of California at Berkeley. Thanks to a very active open source community, this RDBMS has continued to evolve aggressively to meet the needs of business users for both analytics and transaction support. 2 2009 2015 RDBMS Maturity Evaluation (Source: Gartner) 4 – Unstructured data Unstructured data is growing significantly faster than structured data. According to Gartner 80% of business is conducted on unstructured information. As a result, enterprise expenditure on filers is growing, and IT executives know that action is required to ensure that this expenditure does not grow out of control. On the other hand, RDBMS databases were not originally designed to process non-relational data. Some additional software layers and features were added to commercial RDBMS like ORACLE in order to handle this kind of data, but even with these enhancements the performance of such databases with this kind of data is poor and the processing of non-structured data slows down the whole database system. New data storage and data processing concepts as Hadoop, Kafka, Spark or Storm, originally developed at large companies as Google, IBM or Yahoo were donated to open source communities where they were enhanced and now make it easy to reliable process unbounded quantities of unstructured data without paying any licence fees. 5 – Big Data According to Wikipedia “Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy.” In 2012 Gartner defined Big Data as "high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization." This data can both be structured or unstructured. Unstructured data as explained above cannot be processed efficiently on RDBMS systems. But classical RDBMSs have even difficulties handling structured big data. The work may require "massively parallel software running on tens, hundreds, or even thousands of servers". Jacobs, A. (6 July 2009) The Pathologies of Big Data, ACMQueue. 6 – Real time analysis With the increasing amount of available real time data, new business requirements and new business cases for this data are being developed over all industries. In order to monetarize such new concepts, the data has also to be analysed on real time. Wikipedia explains the term RealTime Analytics also called Real Time Business Intelligence as follows: “Real-time business 3 intelligence (RTBI) is the process of delivering business intelligence (BI) or information about [business operations] as they occur. Real time means near to zero latency and access to information whenever it is required.” Classical transactional RDBMSs as ORACLE are not fast enough processing this data for analysis. The main reason being that transactional systems have to assure the transactional integrity and in order to do so they are optimized to store information permanently and this is done on the hard drive (HD). Saving information on HDs takes much longer than processing the data only on the RAM, this is why new in-memory databases were developed for analysis purposes. Such databases do most of the operations on the RAM and store only a small amount of data on the hard drive. ORACLE offers now new in-memory DB features. These features are still not as performant as the products of other new innovative in-memory databases companies and the combined price of the ORACLE regular database with the in-memory features increases the database TCO significantly. 7 – Standard hardware and operating systems The first operating systems were developed for mainframes. These operating systems were extreme proprietary and could only work on a specific hardware. Both hardware and OS were very expensive and only few companies could afford them. In the mid-1970s, with the introduction of microcomputers new operating systems were developed. Both hardware and OS were considerably less expensive than mainframes and hardware and OS started being developed independently from each other. However, such systems were not reliable or performant enough to compete with mainframes as commercial servers. With the release of Intel’s 32-bit architecture and multitasking OS for microcomputers in the mid-1980s the gap between microcomputers and mainframes started being reduced. The Linux kernel originated in 1991, was a milestone on this path. In the mid-1990s organizations such as NASA started to replace their increasingly expensive mainframes with clusters of inexpensive commodity computers running on Linux. Nowadays Linux is the leading operating system on servers and more than 95% of servers run on Intel’s microarchitectures. Nevertheless, a large amount of traditional companies as banks, telcos or insurances still uses obsolete hardware and OS for some or their core applications. The main reason being that these applications were developed over the decades and a migration of data and logic to standard software and hardware was considered to be too risky. Today, this lack of innovation is both expensive and dangerous for the company and its CIO, as hardware, software and staff are getting older and difficult to maintain. Additionally, this obsolete infrastructure is slowing down the business innovation process, making these branches a good target for market disrupters as Google, Amazon or Apple. In order to retire the obsolete hardware and operating systems the data has to be migrated to a new database architecture. Such a step should be done to a state of the art and future-oriented database system. Open Source RDBM systems are to DB what Linux was to OS and should therefore be the preferred migration target. 6 Factors to be considered 1 – Limited IT Budget 4 According to Gartner, IT spending worldwide declined the past two years. Gartner estimates the data center costs to be above 40% of the total budget and are therefore a prime target for costsavings programs. Any change on this area should therefore have a clear business case, a good return on investment and if possible a break-even of less than two years. 2 – IT Safety Databases are a critical part of the IT infrastructure. If the database system fails most of the business processes will stop working. On the other hand, database bugs may produce data corruption which will have a major economic impact on any company. Therefore, reliability and robustness are still the most important aspects of any database infrastructure and should be the main requirements of the new infrastructure. 3 –IT availability and continuity Nowadays, business processes run 24/7 and any disruption has a direct monetary impact. Database downtimes, even during the database migration should be keep to a minimum and if possible be avoided. 4 – Staff The shortage of IT experts on the market is a main concern of any CIO. On the other hand, missing or obsolete IT skills are not valid reasons for lay-off on most European countries. Therefore, any database transformation process in Europe should make sure that the actual staff is being trained on the new technology, the technology should be intuitive, easy to use, well documented and the interfaces should be similar to the existing ones. 5 – Legacy systems ORACLE being the lead database system over the last decades, most of the non-standard applications created from scratch at the companies (legacy) base on ORACLE databases and use the most popular ORACLE non ANSI standard features as PL/SQL, hints, partitioning, SQL*Loader and dblinks. In order to migrate such databases easily, the new database architecture should support these features. 6 – Cloud The next step on hardware standardisation and data center cost reduction after the rolling out of the microcomputer architecture and Linux as OS is the server virtualization. This allows the CIO to easily move the IT to newer, faster and more cost-effective data centers. Any new database architecture should therefore be designed to run on virtual machines on a cloud. 5 6 Technology assets 1 – SAP/HANA Being SAP the leading ERP system, a migration of SAP based applications to SAPs new state of the art SAP/HANA database should always be considered. The main arguments for SAP/HANA are its embedded SAP application support, its in-memory features and its unique support of FIORI apps which enhance the mobile use of SAP software. Open Source RDBMS are not well suited for SAP applications because they are not supported and the use of them will put the SAP applications at risk. ORACLE, on the other hand, will only be supported until 2025. Nevertheless, SAP/HANA is an expensive database system and not suited for cost reduction purposes of non-SAP applications. 2 – EnterpriseDB Many enterprises are using open source RDBMSs to relieve costs. Because these RDBMSs are often easier to administer and more flexible than alternatives, they yield staff time savings and greater operational flexibility. Such enterprises have not relaxed their operational requirements, however. These open source RDBMSs not only must meet the same standards of reliability, scalability, and manageability as the RDBMSs they replace but also, in many cases, must exceed them. The EDB Postgres Platform features the full range of capabilities one would expect of an enterpriseclass RDBMS, building on PostgreSQL and adding greater performance, security, database administrator (DBA) and developer productivity features, and compatibility with traditional enterprise RDBMSs. The EDB Postgres Platform can be deployed to a wide range of infrastructure options from virtualized and container environments to public, private, and hybrid clouds. Professional services, training, 24 x 7 support, and Remote DBA round out the platform ensuring enterprise customer success. According to Wikipedia Gartner positioned EnterpriseDB in the Leaders Quadrant in its Magic Quadrant for Operational Database Management Systems in October 2014 and again in September 2015. EnterpriseDB was recognized in the Challengers Quadrant in the Magic Quadrant for Operational Database Management Systems in October 2016. Thanks to its ORACLE compatibility features (i.g. data structures, syntax, semantics, PL/SQL, functions, packages, utilities and replication services) EDB is a perfect target RDBMS for cost reduction of non-SAP transactional applications running on ORACLE. 3 – Exasol EXASOL is an analytic in-memory, column-based, compressed, massively parallelized, highscalable, tunning-free database that also includes support for Hadoop HDFS formats. The highspeed database is acknowledged by Gartner in its "Magic Quadrant for Data Warehouse Database Management Systems" as the only German database vendor besides SAP. According to the TPC-H benchmark Exasol is the #1 ad-hoc decision support (BI) database system and has the best price-peformance ratio for database analytics. 6 EXASOL is perfect choice for applications requiring real time data analytics and data driven businesses. 4 – Hadoop Apache Hadoop is an open-source software framework used for distributed storage of very large data sets. The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System. Hadoop splits files into large blocks and distributes them across nodes in cluster. Hadoop is the ideal solution for storage and management of unstructured data. The major cloud implementations of Hadoop platforms are Microsoft Azure, Amazon Webservices, Google Cloud Platform and Century Link Cloud services. Apache Hadoop is therefore the perfect complement to a RDBMS system. 5 – Pentaho Data Integration The wide range of information gathered by a business is rarely stored in a single database or format. Data integration is the process by which information from multiple databases is consolidated. With an intuitive, graphical, drag and drop design environment and a proven, scalable, standards-based architecture, Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools. Pentaho Data Integration (PDI) is a powerful ETL application. Thanks to its visual interface, you can extract information from any data source for preparation, transform it and delivery to a target without writing a single line of code. It supports deployment on single node computers as well as on a cloud, or cluster. PDI is written in Java and runs on almost any environment. PDI is a very helpful data migration tool and allows you to design and create all the data migration processes visually, schedule and run them automatically. The community version of PDI is open source and free of charge. Gartner has recognized Pentaho in the February 2016 Magic Quadrant for Business Intelligence and Analytics Platforms as a Visionary Platform. 6 – Shareplex As explained above, during a migration the database downtime should be reduced as much as possible. A key technology is database replication, such technology uses change data capture (CDC) methods that determines (and tracks) the data that has changed at the source database so that action can be taken using the changed data. With this kind of software, it is possible to start a data migration and during the data migration process track data changes and propagate them to the target database. This way, you are able to create an identical copy of a database that keeps all changes of the original one on real time. Once the replication is completed and stable you are able to switch the application from the source database to the target database with virtually no downtime. Quest’s Shareplex offers database connectors to ORACLE, SAP/HANA and EDB and is therefore a perfect tool for a database transformation with extreme short downtimes. 7 Recommended architecture Usually an actual architecture at any industry will be similar to the figure below: The recommended target architecture would be a mixture of the technology explained above: 8 Recommended Methodology 1 – Holistic approach If an organization considers a database transformation, in order to achieve optimal savings and performance results, the database infrastructure should be considered and planned as a whole. Partial or local database transformations often do not achieve good results in terms of license or maintenance savings. As an example, if the organization migrates only the SAP environment to SAP/HANA but still uses ORACLE for the rest of the transactional applications and the DWH the TOC may increase instead of decreasing. On the other hand, an isolated database transformation of the DWH to an in-memory database will increase the performance but will not significantly reduce the costs. 2 – Professional support In order to master this complex challenge, database transformation experts with good know-how and experience on several RDBMS systems and an understanding of both the old and the new architecture as well as ETL and replication technologies are necessary. 2 – Good Assessment The first step towards a transformation should be a good assessment of the actual infrastructure. This assessment can be performed on four steps: Collect information Number and size of databases, used features, available environments and licenses. Design new architecture Size, hardware, software and database features Calculate savings Hardware savings, OS savings and DB savings Define the first Transformation as a POC Select the system, define hardware, define OS and DB, select ETL and replication technology and calculate price and schedule 3 – Fixed price offers Professional experienced partners should be able to make a fixed price offer for both a POC and the global database transformation once they have collected all relevant data. 4 – 1st Transformation as a POC In order to test the database transformation, a first database system should be selected for a POC. This POC can then be conducted on four steps: Document Document storage, DB software, DB Features, Backup and Deployment Implementation Implement DB installation packages. Convert DB software. Implement missing features. 9 Implement ETL processes using appropriated technology (Pentaho and Shareplex). Test Test installation, data transfer, backup and recovery and the application layer. Deployment Install hardware and software. Connect old and new database. Transfer data and redirect application. 5 – Complete transformation as a cycle Once a POC was conducted successfully, the whole transformation can be performed using the same steps on a cycle. Additionally, the old hardware has to be retired and the licence contracts cancelled. Business Case Example Based on the architecture example defined above, a possible business case for the database transformation of an SAP ERP system, 2 transactional applications (i.e. CRM and E-Shop) and a conventional DWH all using ORACLE redundant database servers is calculated below: 10 11 PROVENTA AG Untermainkai 29 60329 Frankfurt am Main www.proventa.de 069 - 23 25 50 Diego Calvo de Nó (Member of the Executive Board) +49 160 478 11 69 [email protected] 12