Download IBM DB2 Top Ten Technologies Transcript

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Transcript
IBM DB2 Top Ten Technologies Transcript
Sid Misra and Irshad Rainhan, Presenters
Slide 1
Welcome to the IBM® DB2® Webcast Series for Oracle Professionals. Running
throughout 2011, this series is designed to help you learn about DB2 in a way that is
fast and fits easily with your schedule. This Webcast is the first in series and is an
overview of the 10 technologies that you should consider in DB2. You will be hearing
more about these features, in more depth, in the upcoming Webcast sessions. I am
Sid Misra of the DB2 product marketing team and I have Irshad Raihan, also in the
DB2 marketing team, online with me today. Irshad and I will be presenting this
Webcast to you. So thanks for your attention and let's get started.
Slide 2
So the first technology which I’ll be talking about today is DB2 performance.
Slide 3
So, IBM DB2 was specifically designed to operate with great efficiency and as a
result, has a long and strong track record of leadership in database benchmarks.
Now this chart represents the number of days of performance leadership of three
important industry performance benchmarks. And this is for over seven years. And
as you can see on the chart, DB2 has been the leader for over twice as long as
compared to the Oracle database for the TPC-C benchmark -- which is for
transactional workloads.
Now for SAP 3-Tier SD as benchmarks, DB2 again has a dominant position and leads
Oracle by over ten times in terms of days of leadership. As you can see on the chart,
DB2 has similar leadership in SPECj benchmark as well.
So you would wonder why are the days of leadership important in benchmarks? Well
industry benchmarks are essentially like a game of leap frog. And vendors like
Oracle, IBM, Microsoft are continuously optimizing to outperform one another and
the company which has been in the lead position varies from time to time. So a
more telling statistic of benchmarks is days of leadership -- which vendor has been
at the top for the longest. And as you can see, DB2 has significantly dominated the
industry over time in performance benchmarks.
Slide 4
Now Chart 4 -- let’s dig a bit deeper into the TPC-C performance here. Now of all the
performance benchmarks, TPC-C is the most prominent and the most important
transaction processing benchmark in the industry. Well here is some data from the
TPC-C benchmark. Compare the amount of throughput per core for Oracle/ Sun’s
best TPC-C results with HP Itanium2’s best results. And then compare it with the
next generation of IBM Power Systems ™. Now on the chart, you can see Power 5
and Power 6 and 7 chip sets. IBM provides far better throughput per core than
Oracle/Sun and HP Itanium. So why is this important in benchmarks?
So this is critical because software is licensed by CPU core. So if you need less CPU
cores for your software then it reduces your initial acquisition cost and it can also
reduce your ongoing software support and maintenance cost. And also having less
CPUs, reduces the complexity of your IT environment and helps you manage the
future growth of your IT environment more efficiently.
Slide 5
®
So the second technology which I want to cover today is DB2 pureScale . IBM
introduced DB2 pureScale to address the increasing business need for scale-out
®
efficiency. This technology is actually based on System z , which has been a gold
standard for high availability and scalability for many years. DB2 pureScale provides
you unlimited capacity, continuous availability, and application transparency to
reduce risk and cost as your business grows.
Slide 6
On the next slide, this is actually something from a public record. It shows the
results of some tests that were conducted by Dell -- analyzing the scalability of
Oracle RAC. Now on the graph on the chart, the X-axis represents the actual nodes
on the cluster and the Y-axis shows the effective nodes. The green line on the graph
shows near-linear performance, while the red line shows the effective performance.
So as you can see on the graph, [when] adding nodes to your Oracle RAC, the
performance doesn’t climb linearly. For example at four nodes, the system performs
as if the cluster contains less than two nodes. Similarly at eight nodes, the system
performs as if the cluster contains less than three nodes. So with Oracle RAC, you’re
spending all this money to build your system -- but by adding nodes, you’re not
getting the same value in terms of increase in scalability and performance.
Slide 7
The next chart compares the scalability of Oracle RAC to DB2 pureScale. Now as you
can see, DB2 pureScale has near-linear scale-out efficiency. DB2 pureScale came to
the market a couple of years ago and it’s similar to Oracle RAC -- it uses similar
architecture and a similar high level approach. However, once you get below that
highest level description, it takes a very different approach. So Oracle RAC
distributes the management across the nodes, whereas DB2 pureScale uses
centralized management. And this together with DB2 pureScale’s global caching,
rarely has made a huge difference in both performance and reliability of DB2
pureScale. Now for reference, I showed you in the previous chart that Oracle RAC
had an effective throughput of 1.69 nodes for a 4 node system. Now as we can see
on this chart, DB2 pureScale has an effective throughput of 3.9 nodes for a 4 node
system. So that’s correct -- DB2 pureScale gives you more than double the
throughput. And this difference gets even bigger when you add more nodes. So at
eight nodes, Oracle gives you 2.44, whereas DB2 pureScale gives you 7.54 effective
nodes. So that is three times more throughput. So Oracle RAC’s inefficient scaling is
only a waste of your precious dollars in hardware and also software because as I
mentioned, software running on the boxes is priced by a processor core. So
essentially you are paying more for your software even though your hardware is not
scaling to the level you desire.
-2-
Slide 8
So the third technology which I’ll be covering today is deep compression.
Slide 9
Storage costs continue to be a concern for customers. Data has been growing
exponentially and with that, the storage cost has been also growing exponentially for
customers. And DB2 9.7 has introduced a deep compression technology which can
help you deliver storage savings. DB2 supports index temporary table as well as XML
compression. For index compression, depending upon the type of index and the data
distribution within that index, DB2 will automatically choose from a set of
compression algorithms that will provide an optimal compression for that index.
Similarly, DB2 has automatic compression for temporary tables as well. And this can
be huge savings especially in warehouse applications, where large sort and large
intermediate data can consume a significant amount of storage space and temporary
tables. DB2 also applies intelligent compression techniques to XML to further reduce
your storage costs. Now 9.7 compresses both XML and it is “inline” as well as when
it is contained in its XDA object. Now a lot of our customers have experienced
tremendous savings with compression. Let alone with data and index compression,
we’ve seen customer saving around 68% to 70% in storage savings. And I have also
seen a significant improvement in performance as well. Now if you add XML
compression to this, the savings are even higher -- they are up to 75%. And
remember this includes only the online database storage needs. But when you
consider backup and recovery databases, savings with compression are even higher
than that.
Slide 10
So the next chart, Chart 10, is from a study which is conducted by Triton Consulting.
So what they did was, they analyzed the complexity of performing some routine
tasks on both Oracle 11g as well as DB2 9.7. And for the purpose of the study, they
got Oracle database DBAs with over 10 years of experience, which is quite an
experienced DBA. And they got the equivalent DBA on the DB2 side. And then they
created a methodology of assessing the complexity of several routine DBA tasks.
Now this chart shows the results of the complexity analysis for data and index
compression. The first graph shows that data compression is 50% less complex with
DB2 9.7 relative to Oracle 11g. And so in other words, it takes 50% less time to
configure data compression with DB2 9.7.
And with regards to index compression, DB2 is 100% less complex than the Oracle
database. And the reason here is that with DB2, index compression and data
compression are part of the same task flow. So essentially, you need no separate
steps for configuring index compression with DB2. So you can see that with deep
compression, it’s very powerful technology in DB2 9.7 and it’s also very simple to
implement with DB2 9.7.
Slide 11
So the fourth technology which I’ll be covering today is autonomics and
administration.
-3-
Slide 12
In recent years, DB2 has focused on making it easier for you to administer your
database system by automating some routine tasks. You not only get optimized
performance but more importantly, it frees up DBA time to work on more high value
tasks. Now with DB2 9.7, DB automates database maintenance, so tasks like
runstats, database restore, and backup utilities can all be automated. And you have
a user friendly visitor that helps you walk through the process.
DB2 also heals itself. So DB2 has a facility called the Health Center. And the Health
Center allows you to set up thresholds for various database warnings and alarms.
You can configure DB2 to use health center recommendations to respond to these
warning and alarm situations.
DB2 can also tune itself and this is a very powerful feature. DB2 has self-tuning
memory manager, STMM, that actually simplifies the task of memory configuration
by automatically setting up values for several memory configuration parameters. So
STMM is very easy to configure. It is actually a two-step process if all the default
values are used. Now in comparison, undertaking memory tuning in the Oracle
database environment is complex and it involves multiple operating system checks …
and several memory parameters that sometimes require a full database restart. So
whether you’re a junior or a senior DBA, STMM can save you hours of tuning time or
days of tuning time for sure. So [it is a]very powerful feature.
Slide 13
So the next chart here, Chart 13 is again from the Triton study which I talked about
earlier. And now this shows the results of complexity analysis of six routine DBA
tasks … they analyzed installation; data compression and index compression which
we talked about earlier; backup and recovery; automatic memory management;
and, authorization. Based on the results of the study, DB2 has a clear and
overwhelming advantage on all six routine DBA tasks that were evaluated.
And as an example, let’s talk about auto memory management which we talked
about earlier in the previous slide. So configuring auto memory management on DB2
is 90% lower in complexity than in the Oracle database. So the report actually
shows that automatic memory management tasks in the Oracle database for a
specified environment, could take 100 minutes of DBA interaction time to complete.
And in contrast, the same automatic memory management tasks for DB2 would take
just 10 minutes. So that’s huge saving of time for a DBA.
So DB2’s simplicity, relative to Oracle database, translates into less DBA time spent
on these routine DBA tasks … also less time in training your new staff … and lower
risk of errors that can impact the quality of service. So again, the autonomics and
the administrative features have definitely helped the DBAs a lot by making their
lives so much easier, by saving them time and letting them focus on more high value
tasks.
Slide 14
So the next technology which I want to talk about is the SQL capability or SQL skill
technology.
-4-
Slide 15
So the SQL compatibility technology of SQL skills in DB2 9.7 has caused a paradigm
shift in the world of database migration -- by allowing customers to migrate
to DB2 from Oracle or Sybase databases, in a matter of days or weeks, rather than
months. The DB2 migration technology is a game-changer -- not only because it
reduces the time and the cost of the actual migration, but also because it reduces
the training and the development cost. So since DB2 has native support for Oracle
PL/SQL syntax, applications built to run on Oracle databases requires few or no
“changes” when run on DB2. What this also means is that DBAs can continue to use
their PL/SQL skills even after migration. And this chart actually also shows a pretty
interesting and telling quote from a notable Oracle expert, who has worked with
Oracle for over 15 years. He calls the capability between Oracle and DB2, “freaky” …
and at a level which he has not seen in the enterprise database world. So this is a
strong statement on the SQL capability technology.
Slide 16
Now as I mentioned earlier, DB2 supports PL/SQL natively. So with DB2 9. 7 starting
all the way from PL/ SQL applications … to Oracle SQL dialect … to the concurrency
model … to data types … built in functions … built in packages … SQLPlus scripts …
Oracle JDBC metrics … and online schema changes -- all are supported natively by
DB2.
This chart also has an interesting graph on the right-hand side. And this shows the
results of the DB2 early access program. Before we went to market with this
capability, we had a very strong beta testing period -- and it was a very long period
as well. It lasted for a year and we had several hundred companies which
participated in this beta program, ranging from different industries that had different
solutions, application sizes, from different parts of the world. With all these
companies, we worked very closely to do a deep analysis of the code. And you can
see the results of these analyses on the chart. What we really wanted to measure
here was, how much of their Oracle PL/SQL code worked on DB2 out-of-the-box …
and how much of it needed tweaking. So what we found was that the lowest amount
of code that was supported out-of-the-box was 90% -- and it varied up to 100%. So
what we did was, we took an average of the 750K lines of code that we analyzed and
we found that 98% of that was supported out-of-the-box. So that’s huge and we’ve
got some great endorsements from a number of customers who worked in the beta
program and have actually migrated to DB2. And one such quote is on the chart
where the customer calls the compatibility, “amazing”.
So at this point I’ll turn it over to Ishird to take you through the remaining five
powerful and exciting technologies in DB2 9.7. Ishird, over to you.
Thanks Sid. Hi everyone, this is Irshad Raihan, I work for the DB2 product
marketing group within IBM’s information management business. And as Sid pointed
out, I will be talking about the remainder of the five technologies. This is part of a
developerWorks® article that is soon to be published and this will cover the top 10
DB2 technologies. They will be in a slightly different order. The way we ordered the
technologies today was that we covered first five technologies that are getting a lot
of press coverage as well as coverage from our customers. We are hearing a lot of
great things around all the technologies that Sid just talked about from performance,
-5-
scalability, compression, autonomics and rounded out nicely by SQL compatibility,
which is causing quite a stir in the market. DBAs and other practitioners are looking
to this technology as a savior itself … because they are trapped with their current
vendor and have no way out … because migration, in the past, has been a huge cost
and risk, and it hasn’t always justified the move.
Whereas now with SQL
compatibility, which is both available for Oracle as well as Sybase customers, it has
changed the game quite a bit.
So, the [next] five technologies that I will be talking about are, as I said, they are
probably not getting as much publicity but they are equally of importance and I will
start with HADR, High Availability Disaster Recovery.
Slide 17
Let me show you a picture quickly, this is a picture of the greenest data center in
North America and it's also very highly available.
Slide 18
Now with high availability and disaster recovery, really there are two thoughts here.
There is the thought around making your data more available, making sure that
when there is a transaction in the middle tier of your three-tier architecture, that is
properly committed and then, there is some sort of confirmation that goes back to
the middle tier, that when it left here the transaction was properly committed. And,
you know, things go wrong and there are national disasters [and] there are other
things that could happen that can bring down your system. And therefore, you want
to make sure that there is transactional integrity across your three tiers. And at
other times, there are planned outages. So, this is particularly a moot point for a lot
of DBAs because there are a lot of things that you can do in DB2 while the system is
still up that you cannot do with other databases.
So, one of those is real-time schema changes, things like stored procedures that
allow online movement of tables for instance … you can move tables online to a
different table space … you can freely change columns used in views and other
objects. You can even change certain data types within a column and that’s almost
unheard of. For those of you who have tried this before, it definitely needs bringing
down the database only for a few minutes. But it's something that can be done on
the fly in DB2, for instance, if you were to change between compatible types such as
integer to varchar or character to decimal, those are allowed in DB2 while your
database is still running. There are also unplanned outages -- stuff happens and you
want to make sure that all your inflight transactional data is captured as well as
anything that needs to be rolled back onto your database, that was persistent,
happens correctly as well.
So, one of the features I want to talk about here is something known as autonomous
transactions and what that really is -- is think of a nested transaction, right … think
of a trigger on a table that is part of a transaction … that in turn calls another
transaction within it. And the nested transaction is an autonomous transaction.
Let's take an example of it. It stores the authorization ID of the person who is
accessing that table and stores that into an audit field. So if your outer transaction
fails and is rolled back, the fact that the inner transaction is completed is not lost
because you want it to be able to record the fact that so-and-so’s authorization ID
viewed this table and this amount of data from this time to this time, right? You
-6-
don't want that fact to be lost.
supported in DB2.
So things like
autonomous transactions are
The other aspect that has been added relatively recently, and I think that this is
another game-changer, is read on standby. So typically, the way you would
configure your HADR environment is you would have a primary server that
essentially processes all the requests coming in and then you would have a standby
server that essentially just hangs around. It's not processing any work and it's just
waiting for the primary server to fail, right? And it takes over as soon as that
happens within a few seconds, including things like inflight transactions, which is
great. But in a environment where you are looking to cut cost, you are looking to
increase utilization of all [resources], and you are looking to do more with your
infrastructure, you don't really want service to be hanging out there twiddling their
thumbs. Instead what you are able to do now with read on standby, which is a
relatively new feature in DB2, is that you are able to run read only workloads on the
standby servers. And what this means is that you can run things like business
intelligence workloads or reports, things that can very quickly be sent to the back
burner in case the secondary server needs to take over from the primary server if
there is some kind of an unplanned outage. And quickly put those to the back
burner and make sure that your actual workload, your real time workload, is being
processed by the secondary [server]and then get back to it (read on standby) when
the primary is back up.
Slide 19
The other great thing about the way HADR works, I will talk to you about it on the
next slide here, is if you look at the way it is set up -- you have two HADR servers on
the top in the yellow shapes there, and then each of those have their own sets of
logs. And you can set up, in this case, that the one on the left is the primary server
and the one on the right is the secondary server. And if the primary goes down, as I
mentioned, the secondary takes over and whenever the primary gets back up, it
essentially becomes the secondary because what use to be the secondary has been
storing inflight transactions and it has all the updated logs. So instead of handing it
back over to the primary, it essentially takes over as the primary.
Another aspects of HADR, that I want to mention here, that I think is a differentiator
compared with the way the competition does it, is synchronous versus nearsynchronous versus asynchronous. And essentially what this is, is three different
modes in which your transactions are committed to the database. And there is a
little bit of a trade-off here and I will talk about that as soon as I explain those.
So in the first case, let's talk about a commits transaction -- a commit request that is
sent in. And in the first case, in the asynchronous case, as soon as the primary
HADR server receives the request, it sends the information over to the secondary
server and as soon as it sends that information over the TCP/IP connection, it sends
back a confirmation to the user that the commit has succeeded. In a nearsynchronous case, when you have a commit request come in, the primary server
sends the -- just as in the first case -- it sends the data that has to be stored into
the log to the secondary HADR server. And once it has been received in the memory
of the secondary HADR server, only then is a confirmation sent to the user. So as
you can see, there is a little bit more of a guarantee here, right? Because in the first
case, your TCP/IP connection … it may be a faulty connection and that sent/receive
handshake may not take place properly … and so your secondary server may need
-7-
another try to actually replicate the information that exists on the primary server.
So you have a little bit of a gap there. In the near-synchronous case, it is a little bit
more of a guarantee. And in the third case, which is your asynchronous case, only
when the data is actually written to the log on the secondary server is a confirmation
sent back to the user. So, again, there is a little bit of a trade-off here. It will
depend on your cost characteristics as well as your SLAs, the way they are defined in
terms of high availability as well as performance. So something for you to think
about … but I wanted to mention that you have a lot of ways in which you can
configure HADR, which are superior to the competition. And also the point about
[the opportunity to use] a combination of HADR for high availability and pureScale
for a scaling up your application as Sid pointed out. I think it will give you a
competitive advantage. And I don't need to tell you about the importance of HADR ..
I will be preaching to the choir … because you know better than I that downtime is
expensive, not just in the amount of business that is lost. You know, there are
multiple examples of Web sites that have been down for a few hours, right, but it is
not the business itself, which is of course significant, but more importantly, it's the
loss in brand equity and it's the erosion of faith that your customers have in your
systems. And that can be a much bigger factor, so this is something that is
important and that needs to be given a lot of time and thought.
Slide 20
Okay. So let's talk about the next technology here, which is data security and
privacy. Again this is a very important topic and there are lots of different aspects to
it. And more often than not, when finger pointing starts, everyone starts pointing at
the DBA and the database practitioners regarding data security. When actually,
there are hundreds of flaws in the system -- in just the way that technology is built,
you have multiple attack points especially with the number of mobile devices out
there today. There is hack that happens at the Web site levels all the time. We will
talk about something called SQL injection in a little bit and talk about how that's
causing a lot of headaches for IT administrators as well as database administrators.
Slide 21
So, when you think about security, there are many questions that can be answered
by DB2. There is a lot of instrumentation that has gone into the product that makes
it very easy for you to, not only secure the boundaries of your database real time,
but also be able to do almost a forensic analysis because there are fingerprints that
hackers leave that will enable you to be able to track them down. So starting with
who is accessing your data, DB2 has the ability to track all your connections and
authorizations as well as being able to track what was changed. So [this is] the
actual statement text that was changed down to the DDL that was actually processed
in addition to where the request came in from, so there are application IDs and, of
course, TCP/IP of the originating request, which many times can help you track down
the perpetrator. Then there is also the ability to track when a certain event
happened, I gave you the example of an autonomous transaction that tracks down to
the point of who accessed what data, when … and that can give you a very good
idea of where the problem might be. You also have the option of reporting how a
certain a database action was processed -- whether a certain person had the
necessary rights to process that transaction. Which brings us to the why . . . .
With a label-based access control technology … it's been around a while but there are
all sorts of new enhancements that have gone into this release of DB2 and in the
-8-
next release as well … and essentially what it lets you do is … let a security
administrator create security policies that are based on labels. So it is very easy to
define the label, then define the roles and then just map those two -- instead of
having to grant individual users access. The other thing that is important here is
around compliance. A security policy, really what it does is, it describes the criteria
that we use to decide who has access to what data. One advantage of doing this is
that you are able to protect sensitive data and are able to separate duties. So your
DBAs can access your table schemas, they can look at the topography of your
database, partitions and all of that; but, they are not able to look at your data. And
there are a lot of compliance rules that dictate that, so DB2 is able to do that for you
automatically.
Slide 22
On the next chart 22, I want to talk a little bit about the actual tools that can help
you do that. So there is … I talked about LDAC which is a feature inside of DB2, but
there is also a plethora of tools that IBM offers that can help you through the life
cycle of your information. And that's the key word there. It is really the life cycle.
We don’t think security is something that happens only after deployment. It really
starts all the way from development, [goes] through your deployment stage and it's
an ongoing process. So if you are looking for tools to define policies, as well metrics
to understand how secure your data is, you might want to read up about Infosphere®
Guardium® or Data Architect or Discovery. There are multiple offerings in each of
those areas. There are workshops. IBM is also running a series of information
governance events that you might be interested in signing up for. These are free
events and you can learn much more about how each of these tools fits in to the
bigger picture, how they can help you set up much more robust information
governance around your organization.
You also have tools that can protect data across the enterprise from unauthorized
use such as Data Reaction as well as Data Privacy and Encryption Expert. And finally
you have a tool that can assess vulnerabilities and validate compliance automatically.
This is important because a lot of companies are required to run periodic internal
audits in addition to external audits. There are industries such as finance and
healthcare, that have all sorts or regulatory rules around the way that data needs to
be accessed … there are privacy issues and, you can only imagine with the explosion
of data, these are only going to get more exaggerated. Therefore if you have tools
that can help you do those audits and comply with those audits automatically, that
not only reduces the time and effort required to perform the audit, but it's actually
making your environment a lot safer … your database environment a lot safer.
Slide 23
Okay, so I want to move on to the next topic here, which is one of my favorites –
XML. IBM and DB2 have endorsed XML for a long time now -- even well before there
were industry standards.
Slide 24
Today you have XML and every industry has their own set of guidelines around … not
just guidelines .. but also the standards such as ACORD for the insurance industry
that dictate the way that XML needs to be processed within a database. And about
four or five years ago, DB2 introduced XML … well we had the ability to process XML,
-9-
but what we did four or five years ago was to announce pureXML®, which was
essentially a breakthrough technology because no one even today does it the way
that DB2 does it. And you see on the picture in the bottom left on slide 24, you have
your XML tree on the left and you have your relational database, RDB, tables on the
right. And those two are married together inside of DB2. In the picture on the right,
you will see there is a single query that uses information that accesses this table
inside of your relational storage and in the same query, you have information that's
being queried out of your XML tree. This is significant because as the picture here
shows, DB2 swallows XML whole. I think it was Information Week that carried this
article that made quite a stir -- not just because of the scary picture and, by the
way, the reason that there was a snake on the cover was DB2’s code names were all
snake-based and that particular release was Viper. So it's not probably biologically
correct because vipers aren’t constrictors but you get the point that the DB2
swallows XML whole. DB2 has the ability to offer XML natively.
This is very significant for insurance companies, for government companies -- think
about the tax forms . . . the tax season and every year, there are small changes to
the tax code and so tax forms change constantly. When you are having to represent
that information as relational columns instead of having to represent it as XML, it's a
huge pain. And it's an even bigger pain, if you have an industry standard that
dictates that you have to supply certain forms as XML because there are dictates
around having online availability. There are all sorts of government forms for
instance that have to be available online. And the easiest way to do that is to store
them as XML.
Now if you don’t have a capability to store XML as XML on your database, what you
have to do is something known as shredding. Essentially what happens with
shredding is an XML tree -- the one that you see in the in the figure on the bottom
left -- inside of that figure, you see the XML DOM tree and each node of that tree is
taken and shredded into multiple relational tables. So it's represented through
multiple relational tables because the one thing that relational tables lack is the
ability to show relationships … even though, they do show relationships in a different
way, but not a relationship between the nodes. It's not easy to show hierarchy –
those are the kinds of relationships I was referring to. It's not easy to show that soand-so nodes belong to so-and-so parents and have so-and-so children nodes, etc.
And to be able to capture all of that information in relational tables takes a lot of
time and effort. Therefore, companies that were and still are storing XML as
relational tables have to incur all this overhead every time they take that XML and
parse it into relational tables. And [when] they make a change … essentially the
opposite has to be done. Translation has to be done and each of these tables, that
the XML DOM tree has, has to be reconstructed on the fly and there is a lot of
overhead going on there. And finally, as the data in those relational tables is stored
as large objects or LOBs or BLOBs, that again just increases your storage. The way
that DB2 does XML is we store it as XMLs. That’s number one -- it's not being stored
as a LOB, it's being stored as XML and it's very easy to insert as well as to make any
changes because it's just XML -- it's just so much easier, so much faster -- there is
no overhead there. The other thing I want to say here [about] the DB2 approach to
XML is the use of the Simple API for XML (SAX) and it is a parse once technology.
This is again very important like I said, you parse the XML once and you store it in
the DOM-like tree structure and now you have the performance and flexibility to get
to the data and modify quickly. If you are a DBA and you have tried to add a column
to your database -- sounds simple enough -- but the amount of work that goes into
adding a column … running all the unit tests … running all the system tests … making
- 10 -
sure that you haven't broken anything else in the system … and then finally making
the changes to your applications takes lot of work. With XML, it is as simple as
adding a node to the tree. So there are tremendous advantages for you to store
your data as XMLs, especially if your industry requires data to be stored as XML -there is even more advantage.
Slide 25
One other thing I want to talk around XML and this is a figure that I stole from a
flash book that is out there. This is a book on DB2’s nine features written by a
bunch of authors from our Toronto lab and this particular figure talks about how XML
works with scalability. Some of you might wonder, “well this is great news around
XML but my data doesn’t exist in a single partition -- my data is spread across
multiple database partitions” -- which is the right thing to do if you want scalability
and you want to be able to use features such as DPF from DB2.
Now what is interesting is that scalability services, they fully support pureXML. So
after you hash partition your data, each row of a given table is placed in a specific
database partition based in the hash valley of the table’s distribution. On reading or
writing, DB2 is automatically able to direct work to the relevant partition. So there is
no work on your part, as a DBA or a developer, to make that happen. It's happening
automatically for you. Now why is it important? It's important because even if your
data -- your XML data -- resides in multiple partitions, you can actually benefit from
parallelizing some of those operations that require different parts of data. From the
example here, you have sales of four quarters in four different partitions in your
database and that’s in your create table clause, that's the way it that was specified.
So if you want to run an annual report for instance, you are able to parallelize
operations in each of these quarters and run the report. Obviously there are huge
performance gains to be had. The other thing that I want to mention about this
statement here, is the partition by range clause that some of you might have picked
up on. Now this is important because the great thing about the XML implementation
within DB2 is that you manage range-partitioned tables in exactly the same way that
you would work with relational tables as you would with XML tables. So if you want
to create tables that house your XML data, you would use your partition by range
clause just as you would do with the relational data. Another aspect of this is
multidimensional clustering tables. Those of you who are used to working with star
schema especially with SAP environments, this is very important. If you want to use
XML columns in multidimensional clustering, you use the organized by dimension
clause of the create table and everything else stays the same.
Slide 26
Okay, so there are two more aspects that I want to talk about here. The first is
packaging flexibility. We believe that the more options we are able to offer our
customers, the better fit that they will be able to find for their organization -- not
just in terms of the addition of the database but all the way up through the stack.
You might have heard of the workload optimized assistance paradigm -- and the
point there is that we are looking to offer much flexibility to customers in the way
that they build and grow these workload optimized systems in a modular fashion.
And I want to talk to some aspects of that flexibility as it relates to DB2.
- 11 -
Slide 27
I want to start with sub capacity licensing. This is a very important aspect of the
way that DB2 is licensed and sold and bought. The point here is that virtualization is
everywhere. It makes a lot of sense … it increases utilization … and there is a lot of
workload that can be managed nicely using virtualization. There are multiple ways
you can achieve server virtualization. And one of the ways that you will install and
deploy your databases in a virtualized environment is that -- think of a multi-core
box that you install DB2 on -- if it is a six processor box and you install DB2 on only
three of those processors, you will be charged on the three cores that you are using
DB2 on. Just because the box has six cores does not mean you are going to be
paying for six cores and that's very different from the way other vendors do it. They
typically charge for the box and don’t care how many cores you are running their
products on. This is significant. A lot of customers have come back and told us that
this has helped them increase their utilization and also lower license costs which they
could then use for other purchases that would more revenue oriented.
The other point around flexibility is capacity on demand. Sid talked about pureScale
and the way pureScale can be configured and purchased is you would just need to
purchase additional license files and you can then add capacity to your cluster. So
think of cyclical businesses … tax season is coming up … think of businesses that
grow exponentially for six or eight weeks of the year and then they go back to a
steady state which is significantly lower … think of businesses in retail that have
huge spikes around Thanksgiving and Christmas and again, go back to somewhat of
a flatter demand curve for the remainder of the year. If you are looking to increase
capacity for a few weeks even down to days, pureXML can be licensed on a per day
basis and therefore, you pay for the capacity that you use instead of overprovisioning for your peak capacity and then having all that infrastructure lie idle.
We offer you the flexibility of paying as you go. The other point I want to make
about flexibility is pureXML. I mentioned that any of you who have tried adding a
column to a table [knows] it's a lot of work. With XML, you are able to, not only add
columns, but you are able to change data on the fly. You are able to do a lot of
manipulations to your data that are not possible in the way that other vendors store
XML which is essentially as large objects. And finally I want to talk about the DB2
Advanced Enterprise Server Edition
Slide 28
So there is flexibility in terms of the editions of DB2. We start all the way from DB2
Express-C which is a free version available to developers. A lot of students have to
use it … a lot of professors use it to teach DB2 courses in the classroom … there are
small companies even that will run DB2 Express-C. There are certain memory and
processor limits but there are a lot of small companies-- little offices that don’t really
need all that much and therefore they are able to use the free version of DB2. But in
addition to that, you also have DB2 Workgroup and DB2 Express, which are typically
oriented towards departmental applications. And on top of that, you have DB2
Enterprise Edition which essentially has all the high availability, the compression and
pure XML -- all the different offerings under the DB2 umbrella. Now recently we
announced another edition on top of the Enterprise Edition. [It is DB2 Advanced
Enterprise Server Edition.] And this is great value because really if you look at the
way it is priced, it is only about 10% more than the Enterprise Edition, but it packs a
wallop. It's got storage optimization which is your decompression … it has got
advanced access control, LBAC, I talked about earlier … it has got the workload
- 12 -
management features … it has got tools from Optim® … there are performance
development tools and administration tools. And these are tools not just to manage
DB2, but if you have Informix or Sybase or Oracle or Microsoft, it will help you
manage heterogeneous environments as well. There is the replication feature as
well as Federation Server. So there is a lot of value that has gone into this edition.
There are no memory usage or processor core limits – [it is] basically limitless for
the amount of memory that you want to add to those machines. Some of the other
editions have certain limits but Enterprise Edition has neither of those and we are
seeing customers come back and tell us that this is actually great value -- not just
because of the amount [of features] that has gone in there but also if you will just
look at pure pricing, this is less expensive than a bare bones Oracle license and that
has significant value. So I just wanted the mention DB2 Advance Enterprise Server
Edition as well.
Slide 29
All right, so I wanted to wrap up today's technologies discussion with the last one …
number 10 … again, last but not the least, very important -- is tools.
Slide 30
What are all the different tools that IBM offers around productivity, performance,
security, problems determination and that fuzzy thing called the Cloud. So I will
start with the Cloud. DB2 is available on the Cloud. We have been out there on the
Amazon Cloud for many years now, even before the Amazon Cloud was really
famous. And also DB2 has its own Cloud services platform through the IBM Cloud
platform. What's interesting about the DB2 offering on the Cloud is that it is one of
the most sought-after and used offerings on Cloud. Databases, by the way, by far
are the most used software on the Cloud. And out of those, DB2 has seen some
tremendous success on Clouds. So I just wanted to mention that in here as well and
there are tools that will help you deploy and deliver out to the Cloud. And the great
thing about the way DB2 is licensed and developed is, whether it's DB2 Express-C or
Workgroup or Enterprise or DB2 on the Cloud, it is all the same core engine, which is
great. We think of it as a Russian doll model where you have a complete replica of
the doll inside of another doll and this is great because what it means is when you
move up editions or when you move from a standard deployment to a cloud
deployment, you don’t have to change your applications. It's all the same code that
your applications are running against and therefore it's just a matter of changing
around the licenses -- so that's an additional benefit as well.
So to talk about the multitude of tools, I want to take an example of something
called SQL injection. Now some of you might know the term and essentially what it
is, is a type of hacker attack and the way it works is, there are certain data-rich
applications that save user inputs in the database and they are not able to tell
whether the input is SQL or it's an actual valid user input. And there is dynamic SQL
that is generated on the basis of that input. So essentially what the hackers do is,
instead of putting in your typical user-defined inputs, what they put in are pieces of
SQL code that enter your system and generate dynamic queries. And that can be a
huge problem and this is really a problem in the SQL standard itself. But it doesn’t
make a distinction between control and data plan. And this is a huge issue. And DB2
has tools that can help you address that. So SQL injection affects confidentiality
because databases generally hold sensitive data and loss of confidentiality is a
frequent problem.
- 13 -
Authentication is another issue, for SQL commands are used to check user names
and passwords. It may be possible to connect to a system as another user with no
previous knowledge of the password. This is a huge authorization issue.
If
authorization information is held in a SQL database, it may be possible to change this
information through successful exploitation of a SQL injection vulnerability and, of
course, all of this leads to integrity issues.
So that there are huge issues around SQL injections. There have been quite a few
cases of it every year. The tools around DB2 can help you limit user access and
reduce SQL injection because you are able to grant execute privileges on query
packages versus access privileges on tables. That's a key difference there. Another
differentiator is the acceleration to problem resolution. You are able to trace back
SQL execution to a specific package and be able to pinpoint the originating source.
You are also able to visualize application SQL and correlation metadata and increase
system capacity and drive down database cycle.
Slide 31
So there are tools . . . I have listed some of these tools on the next slide on 31
around application development performance management and availability of the
database management.
So again many of these tools run in heterogeneous
environments. You are able to manage more than one database using a single tool
but you are also able to not just look for vulnerabilities, but actually solve a lot of
those problems.
Slide 32
Okay, so that concludes the ten technologies and I hope you have enjoyed it. And I
want to conclude with upcoming topics in this series. This was the first of the series
Webcast for Oracle Professionals and this [series] will cover many other topics as we
get through the year. The next one scheduled is the Advanced Enterprise Server
Edition that I just talked about -- SQL compatibility that Sid talked about, data
storage, pureXML, SAP, DB2 for SAP -- that's surely going to be an interesting one.
And there is a bunch for other topics that are scheduled. If there is a topic that you
would like to request, we will give you an e-mail on the next slide that you can write
into and we would love to hear from you. On the bottom right on chart 33, there is
also some more information about the workshops that I was telling you about. This
is a free workshop … it's two day … very hands on, you would get your hands dirty
with the code … you will get to see demos … you will get to run your own code and
really see for yourself the compatibility between PL/SQL and IBM’s DB2’s dialect of
SQL. And there is also the certification opportunity, so for those of you who might
be looking for career advancement or looking for new opportunity, definitely
certification is the way to get started.
Slide 34
And with that, I’d like to wrap it up. If you have questions or feedback about today’s
session, we would love to hear from you. And Cindy Russell is really the person
behind this, and I want to thank her for setting this up and she runs the DB2
practitioner program. She would be delighted to hear from you whether it is
feedback on today's session, or topics that you would like to hear, or if you have
- 14 -
questions for follow up on things we talked about today, please feel free to email her
at that address. With that, I would like to thank you all.
© Copyright IBM Corporation 2011.
IBM Software Group
Route 100
Somers NY 10589
U.S.A.
Produced in the United States of America
May, 2011
All Rights Reserved.
IBM, the IBM logo, ibm.com, and DB2 are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries or both. If these or other IBM trademarked terms are marked with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time
this information was published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml
Oracle is a registered trademark of Oracle Corporation in the United States, other countries or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.
UNIX is a registered trademark of The Open Group in the United States, other countries or both.
Windows is a trademark of Microsoft Corporation in the United States, other countries or both.
Other company, product or service names may be trademarks or service marks of others.
- 15 -