Download database transformation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Oracle Database wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

SAP IQ wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Clusterpoint wikipedia , lookup

Transcript
DATABASE
TRANSFORMATION
November 2016
ABSTRACT
The continuous IT budget reductions, the high
database license and maintenance prices, the
upcoming SAP support shortfall for ORACLE,
the new hadoop and in-memory database
technologies and the increasing requirements
for non-structured data and real time big data
analysis are pushing the CIOs over all business
verticals to undertake a complete
transformation of their database
architecture. We discuss here how this new
architecture could look like and which steps
are necessary in order to safely walk this
route and master this new challenge.
Overview
Over the last twenty years ORACLE has managed to remain the #1 database solution vendor on the
market. The main reason for this was the great stability and transaction safety of their product.
Other factors like the alliances with key application and hardware vendors, their multiple OS
support, their good training concept, the large amount of specialist and their aggressive sales and
acquisition strategies have also helped to preserve their lead on this kind of technology.
During this period of time ORACLE was very careful on the enhancement and upgrade of their core
technology. The main features of the core database system remained almost unchanged over the
years and most of the new technologies were added as new software layers (i.e. RAC, queuing,
partitioning, data guard, spatial, OLAP, In-Memory). This strategy had advantages for the
robustness and backwards compatibility and also helped the sales cycle by increasing the price of
the database with the addition of independent but very critical features.
Thanks to this strategy, the quality of the core software and their good reputation, CIOs over the
world did not hesitate to buy the product and remained good customers. An eventual vendor
change would have been too risky, no cheaper and reliable alternatives were on the market and
the IT budgets were generous.
However, during the last five years the situation has changed and CIOs are or should start thinking
about a change. We have listed here the reasons for a change, the factors to be considered, a list
of key technologies that will help on the way, an example of a state of the art DB architecture, a
recommended methodology and a business case example.
1
7 reasons for a change
1 – Database license costs
During the last years, server costs were reduced dramatically and their performance was increased
even more. The main factor for the cost reduction was the hardware standardisation, but the
acceptance of open source operating systems (Linux), open source Web technology (Apache) and
open source development tools (i.e. JAVAEE, Eclipse, or GIT) has also contributed to this trend.
On the other hand, database license costs did not decrease and are now one of top costs of the IT
infrastructure. These costs are even increasing once companies start implementing spatial or inmemory data analytics features on top of the existing database architecture.
2- SAP database and mobile support
SAP has communicated that it will stop its ORACLE support until 2025 (Source: SAP Roadmap). Even
now SAP HANA is mandatory for fact sheets, analytics apps and most of the rest of the FIORI Apps.
If you want to transform your IT according to the new mobile requirements before 2025, you will
need to migrate sooner your SAP database to SAP HANA.
3 – Open source maturity
The open source movement has led to a number of alternatives to large, complex, and expensive
relational database management systems (RDBMSs) for addressing most enterprise data
management problems. Open-Source RDBMSs (OSRDBMSs) have matured significantly and can
now be used to replace commercial RDBMSs. Gartner already stated on its report in April 2015
“The most demonstrable benefit of OSRDBMs given their increasing suitability from a technology
perspective, is the TCO of these products. When skills were at a minimum, management tools were
few and the software was relatively immature the TCO was not necessarily lower than those of
commercial vendor offerings. That has changed to the point where we now believe that the cost of
managing OSDBMSs and the availability of skills are now close to parity with those of the
commercial DBMS offerings.” A leading example of this development is EDB Postgres, an open
source variant of the pioneering RDBMS project PostgreSQL, which was originally developed by
Dr. Michael Stonebraker and his team at the University of California at Berkeley. Thanks to a very
active open source community, this RDBMS has continued to evolve aggressively to meet the
needs of business users for both analytics and transaction support.
2
2009
2015
RDBMS Maturity Evaluation (Source: Gartner)
4 – Unstructured data
Unstructured data is growing significantly faster than structured data. According to Gartner 80% of
business is conducted on unstructured information. As a result, enterprise expenditure on filers is
growing, and IT executives know that action is required to ensure that this expenditure does not
grow out of control. On the other hand, RDBMS databases were not originally designed to process
non-relational data. Some additional software layers and features were added to commercial
RDBMS like ORACLE in order to handle this kind of data, but even with these enhancements the
performance of such databases with this kind of data is poor and the processing of non-structured
data slows down the whole database system. New data storage and data processing concepts as
Hadoop, Kafka, Spark or Storm, originally developed at large companies as Google, IBM or Yahoo
were donated to open source communities where they were enhanced and now make it easy to
reliable process unbounded quantities of unstructured data without paying any licence fees.
5 – Big Data
According to Wikipedia “Big data is a term for data sets that are so large or complex that
traditional data processing applications are inadequate to deal with them. Challenges include
analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying,
updating and information privacy.” In 2012 Gartner defined Big Data as "high volume, high
velocity, and/or high variety information assets that require new forms of processing to enable
enhanced decision making, insight discovery and process optimization." This data can both be
structured or unstructured. Unstructured data as explained above cannot be processed efficiently
on RDBMS systems. But classical RDBMSs have even difficulties handling structured big data. The
work may require "massively parallel software running on tens, hundreds, or even thousands of
servers". Jacobs, A. (6 July 2009) The Pathologies of Big Data, ACMQueue.
6 – Real time analysis
With the increasing amount of available real time data, new business requirements and new
business cases for this data are being developed over all industries. In order to monetarize such
new concepts, the data has also to be analysed on real time. Wikipedia explains the term RealTime Analytics also called Real Time Business Intelligence as follows: “Real-time business
3
intelligence (RTBI) is the process of delivering business intelligence (BI) or information about
[business operations] as they occur. Real time means near to zero latency and access to information
whenever it is required.”
Classical transactional RDBMSs as ORACLE are not fast enough processing this data for analysis. The
main reason being that transactional systems have to assure the transactional integrity and in order
to do so they are optimized to store information permanently and this is done on the hard drive
(HD). Saving information on HDs takes much longer than processing the data only on the RAM, this
is why new in-memory databases were developed for analysis purposes. Such databases do most
of the operations on the RAM and store only a small amount of data on the hard drive.
ORACLE offers now new in-memory DB features. These features are still not as performant as the
products of other new innovative in-memory databases companies and the combined price of the
ORACLE regular database with the in-memory features increases the database TCO significantly.
7 – Standard hardware and operating systems
The first operating systems were developed for mainframes. These operating systems were
extreme proprietary and could only work on a specific hardware. Both hardware and OS were very
expensive and only few companies could afford them. In the mid-1970s, with the introduction of
microcomputers new operating systems were developed. Both hardware and OS were considerably
less expensive than mainframes and hardware and OS started being developed independently from
each other. However, such systems were not reliable or performant enough to compete with
mainframes as commercial servers. With the release of Intel’s 32-bit architecture and multitasking
OS for microcomputers in the mid-1980s the gap between microcomputers and mainframes started
being reduced. The Linux kernel originated in 1991, was a milestone on this path. In the mid-1990s
organizations such as NASA started to replace their increasingly expensive mainframes with clusters
of inexpensive commodity computers running on Linux. Nowadays Linux is the leading operating
system on servers and more than 95% of servers run on Intel’s microarchitectures. Nevertheless, a
large amount of traditional companies as banks, telcos or insurances still uses obsolete hardware
and OS for some or their core applications. The main reason being that these applications were
developed over the decades and a migration of data and logic to standard software and hardware
was considered to be too risky. Today, this lack of innovation is both expensive and dangerous for
the company and its CIO, as hardware, software and staff are getting older and difficult to maintain.
Additionally, this obsolete infrastructure is slowing down the business innovation process, making
these branches a good target for market disrupters as Google, Amazon or Apple.
In order to retire the obsolete hardware and operating systems the data has to be migrated to a
new database architecture. Such a step should be done to a state of the art and future-oriented
database system. Open Source RDBM systems are to DB what Linux was to OS and should therefore
be the preferred migration target.
6 Factors to be considered
1 – Limited IT Budget
4
According to Gartner, IT spending worldwide declined the past two years. Gartner estimates the
data center costs to be above 40% of the total budget and are therefore a prime target for costsavings programs. Any change on this area should therefore have a clear business case, a good
return on investment and if possible a break-even of less than two years.
2 – IT Safety
Databases are a critical part of the IT infrastructure. If the database system fails most of the
business processes will stop working. On the other hand, database bugs may produce data
corruption which will have a major economic impact on any company. Therefore, reliability and
robustness are still the most important aspects of any database infrastructure and should be the
main requirements of the new infrastructure.
3 –IT availability and continuity
Nowadays, business processes run 24/7 and any disruption has a direct monetary impact.
Database downtimes, even during the database migration should be keep to a minimum and if
possible be avoided.
4 – Staff
The shortage of IT experts on the market is a main concern of any CIO. On the other hand, missing
or obsolete IT skills are not valid reasons for lay-off on most European countries. Therefore, any
database transformation process in Europe should make sure that the actual staff is being trained
on the new technology, the technology should be intuitive, easy to use, well documented and the
interfaces should be similar to the existing ones.
5 – Legacy systems
ORACLE being the lead database system over the last decades, most of the non-standard
applications created from scratch at the companies (legacy) base on ORACLE databases and use
the most popular ORACLE non ANSI standard features as PL/SQL, hints, partitioning, SQL*Loader
and dblinks. In order to migrate such databases easily, the new database architecture should
support these features.
6 – Cloud
The next step on hardware standardisation and data center cost reduction after the rolling out of
the microcomputer architecture and Linux as OS is the server virtualization. This allows the CIO to
easily move the IT to newer, faster and more cost-effective data centers.
Any new database architecture should therefore be designed to run on virtual machines on a cloud.
5
6 Technology assets
1 – SAP/HANA
Being SAP the leading ERP system, a migration of SAP based applications to SAPs new state of the
art SAP/HANA database should always be considered. The main arguments for SAP/HANA are its
embedded SAP application support, its in-memory features and its unique support of FIORI apps
which enhance the mobile use of SAP software. Open Source RDBMS are not well suited for SAP
applications because they are not supported and the use of them will put the SAP applications at
risk. ORACLE, on the other hand, will only be supported until 2025. Nevertheless, SAP/HANA is an
expensive database system and not suited for cost reduction purposes of non-SAP applications.
2 – EnterpriseDB
Many enterprises are using open source RDBMSs to relieve costs. Because these RDBMSs are often
easier to administer and more flexible than alternatives, they yield staff time savings and greater
operational flexibility. Such enterprises have not relaxed their operational requirements, however.
These open source RDBMSs not only must meet the same standards of reliability, scalability, and
manageability as the RDBMSs they replace but also, in many cases, must exceed them.
The EDB Postgres Platform features the full range of capabilities one would expect of an enterpriseclass RDBMS, building on PostgreSQL and adding greater performance, security, database
administrator (DBA) and developer productivity features, and compatibility with traditional
enterprise RDBMSs. The EDB Postgres Platform can be deployed to a wide range of infrastructure
options from virtualized and container environments to public, private, and hybrid clouds.
Professional services, training, 24 x 7 support, and Remote DBA round out the platform ensuring
enterprise customer success.
According to Wikipedia Gartner positioned EnterpriseDB in the Leaders Quadrant in its Magic
Quadrant for Operational Database Management Systems in October 2014 and again in September
2015. EnterpriseDB was recognized in the Challengers Quadrant in the Magic Quadrant for
Operational Database Management Systems in October 2016.
Thanks to its ORACLE compatibility features (i.g. data structures, syntax, semantics, PL/SQL,
functions, packages, utilities and replication services) EDB is a perfect target RDBMS for cost
reduction of non-SAP transactional applications running on ORACLE.
3 – Exasol
EXASOL is an analytic in-memory, column-based, compressed, massively parallelized, highscalable, tunning-free database that also includes support for Hadoop HDFS formats. The highspeed database is acknowledged by Gartner in its "Magic Quadrant for Data Warehouse Database
Management Systems" as the only German database vendor besides SAP.
According to the TPC-H benchmark Exasol is the #1 ad-hoc decision support (BI) database system
and has the best price-peformance ratio for database analytics.
6
EXASOL is perfect choice for applications requiring real time data analytics and data driven
businesses.
4 – Hadoop
Apache Hadoop is an open-source software framework used for distributed storage of very large
data sets. The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed
File System. Hadoop splits files into large blocks and distributes them across nodes in cluster.
Hadoop is the ideal solution for storage and management of unstructured data. The major cloud
implementations of Hadoop platforms are Microsoft Azure, Amazon Webservices, Google Cloud
Platform and Century Link Cloud services. Apache Hadoop is therefore the perfect complement to
a RDBMS system.
5 – Pentaho Data Integration
The wide range of information gathered by a business is rarely stored in a single database or format.
Data integration is the process by which information from multiple databases is consolidated. With
an intuitive, graphical, drag and drop design environment and a proven, scalable, standards-based
architecture, Data Integration is increasingly the choice for organizations over traditional,
proprietary ETL or data integration tools.
Pentaho Data Integration (PDI) is a powerful ETL application. Thanks to its visual interface, you can
extract information from any data source for preparation, transform it and delivery to a target
without writing a single line of code. It supports deployment on single node computers as well as
on a cloud, or cluster. PDI is written in Java and runs on almost any environment.
PDI is a very helpful data migration tool and allows you to design and create all the data migration
processes visually, schedule and run them automatically. The community version of PDI is open
source and free of charge.
Gartner has recognized Pentaho in the February 2016 Magic Quadrant for Business Intelligence and
Analytics Platforms as a Visionary Platform.
6 – Shareplex
As explained above, during a migration the database downtime should be reduced as much as
possible. A key technology is database replication, such technology uses change data capture (CDC)
methods that determines (and tracks) the data that has changed at the source database so that
action can be taken using the changed data. With this kind of software, it is possible to start a data
migration and during the data migration process track data changes and propagate them to the
target database. This way, you are able to create an identical copy of a database that keeps all
changes of the original one on real time. Once the replication is completed and stable you are able
to switch the application from the source database to the target database with virtually no
downtime.
Quest’s Shareplex offers database connectors to ORACLE, SAP/HANA and EDB and is therefore a
perfect tool for a database transformation with extreme short downtimes.
7
Recommended architecture
Usually an actual architecture at any industry will be similar to the figure below:
The recommended target architecture would be a mixture of the technology explained above:
8
Recommended Methodology
1 – Holistic approach
If an organization considers a database transformation, in order to achieve optimal savings and
performance results, the database infrastructure should be considered and planned as a whole.
Partial or local database transformations often do not achieve good results in terms of license or
maintenance savings. As an example, if the organization migrates only the SAP environment to
SAP/HANA but still uses ORACLE for the rest of the transactional applications and the DWH the
TOC may increase instead of decreasing. On the other hand, an isolated database transformation
of the DWH to an in-memory database will increase the performance but will not significantly
reduce the costs.
2 – Professional support
In order to master this complex challenge, database transformation experts with good know-how
and experience on several RDBMS systems and an understanding of both the old and the new
architecture as well as ETL and replication technologies are necessary.
2 – Good Assessment
The first step towards a transformation should be a good assessment of the actual infrastructure.
This assessment can be performed on four steps:

Collect information
Number and size of databases, used features, available environments and licenses.

Design new architecture
Size, hardware, software and database features

Calculate savings
Hardware savings, OS savings and DB savings

Define the first Transformation as a POC
Select the system, define hardware, define OS and DB, select ETL and replication
technology and calculate price and schedule
3 – Fixed price offers
Professional experienced partners should be able to make a fixed price offer for both a POC and
the global database transformation once they have collected all relevant data.
4 – 1st Transformation as a POC
In order to test the database transformation, a first database system should be selected for a
POC. This POC can then be conducted on four steps:


Document
Document storage, DB software, DB Features, Backup and Deployment
Implementation
Implement DB installation packages. Convert DB software. Implement missing features.
9


Implement ETL processes using appropriated technology (Pentaho and Shareplex).
Test
Test installation, data transfer, backup and recovery and the application layer.
Deployment
Install hardware and software. Connect old and new database. Transfer data and redirect
application.
5 – Complete transformation as a cycle
Once a POC was conducted successfully, the whole transformation can be performed using the
same steps on a cycle. Additionally, the old hardware has to be retired and the licence contracts
cancelled.
Business Case Example
Based on the architecture example defined above, a possible business case for the database
transformation of an SAP ERP system, 2 transactional applications (i.e. CRM and E-Shop) and a
conventional DWH all using ORACLE redundant database servers is calculated below:
10
11
PROVENTA AG
Untermainkai 29
60329 Frankfurt am Main
www.proventa.de
069 - 23 25 50
Diego Calvo de Nó
(Member of the Executive Board)
+49 160 478 11 69
[email protected]
12