Download Filling in the gaps in NoSQL document stores and data lakes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
www.pwc.com/technologyforecast
Technology Forecast: Remapping the database landscape
Issue 1, 2015
Filling in the gaps in
NoSQL document stores
and data lakes
Matthias Brantner describes the role database
virtualization and a business-user query interface
can play in heterogeneous environments.
Interview conducted by Alan Morrison and Bo Parker
PwC: How are companies using
NoSQL or non-relational databases?
MB: Developers use NoSQL databases because
those databases are relatively easy to set up
and they’re mostly free to get started. NoSQL
databases are a no-brainer for developers
to get something up and running quickly.
MongoDB is an example—it’s very easy to
install, and developers jumped on it.
Matthias Brantner
Matthias Brantner is the former
CTO of 28msec and is a
consulting member of technical
staff at Oracle. This interview was
conducted in 2014.
But it’s not really clear to me how these
databases will be used in the mainstream
enterprise and what they will be used for.
You can certainly analyze long streams
of data from your websites or all the
clickstreams. But, as things currently stand,
eventually you need a developer to help
you deal with the flexibility that those
databases give you. You can essentially put
everything in there, but to figure out what
is in the database, you need a developer.
The tools haven’t caught up. That’s the
next thing that needs to happen to drive
adoption of those systems and to make
the data in data lakes or enterprise data
hubs accessible by business users.
PwC: NoSQL query languages
are still scarce.
MB: That’s right. Those NoSQL databases
started without really having query languages
on top of them. Some of them are just key-value
stores, and you must write a program to get the
data out. MongoDB, for example, has a slightly
more sophisticated query language, but it’s still
very developer focused. Now many vendors are
bringing SQL back onto NoSQL or Hadoop. But
generally the semantics are different because
NoSQL and Hadoop do not use the SQL data
model. So you really cannot use SQL semantics.
“Just because you have an API doesn’t mean you
don’t need to maintain the database anymore.”
Everyone who’s trying to serve the market is
currently cooking in their own kitchen. Vendors
need to focus not only on developer ease of use
but also on the ability of business users to look
at the data. The technology makes sense, but
just collecting data is expensive and doesn’t
make sense. You need to know why you are
collecting the data and what you’ll do with it.
PwC: One organization we’ve
heard from seems to have a solid BI
[business intelligence] group and
said they’re able to integrate their
data from NoSQL and relational
sources fairly quickly, more quickly
than they can via APIs [application
programming interfaces] and
application-style integration.
PwC: How can executives reduce
reliance on IT when it comes
to databases generally?
MB: Let me give you an example of what we
experienced in the disclosure management
fields with XBRL. In this case, business users
are trying to get relational database systems
to explore the information coming from the
XBRL filings. The problem is that a lot of
dimensional metadata is in XBRL filings. And
in relational databases, those dimensions
are part of the database schema. So your
database schema encapsulates the dimension.
PwC: Many developers nowadays keep
their logic in the app itself. Is 28msec
putting the logic into the database?
2
PwC Technology Forecast
PwC: Presumably you could do that
with a relational database also.
MB: Yes, you could do that as well. I think the
problem with the relational database is that
you must migrate your schema completely if
you make modifications, which is very hard:
You mostly need IT, the person developing
the application cannot migrate the schema,
and the modifications might have great
performance impacts on other aspects of the
system. So the use of a relational database
should be considered very carefully.
MB: Absolutely. Just because you have an
API doesn’t mean you don’t need to maintain
the database anymore. If you start having a
lot of data silos, each with its own API, you’re
gathering a lot of technical debt with lots of
code for APIs on different data stores. Each
data store does a very specific or very small
thing, but only one or two developers might
understand it and they need to continue
maintaining it. And if new demands for
the data come in, you’re gathering more
and more code that you must maintain.
Companies can have many different data
repositories but not really understand
how they contribute to the big picture.
MB: Yes. In their virtual database approach,
the database and the application server are
one thing, and you can write your entire
application in the declarative JSONiq
language, which allows you to read your
own writes and make calls to the outside
world. The only thing you expose is an API.
You essentially have one thing, which is
the database/application server—and it’s
using one language that directs both the
application logic and the data management.
Dimensions often change, and only the
business user understands the semantics
of the dimensions and can add, modify,
or remove them. The business user must
talk to IT for every change. That certainly
doesn’t make sense, and I think this
barrier is the one we need to break.
Filling in the gaps in NoSQL document stores and data lakes
A business user should be in control of the
metadata, of the schema, because a business
user is the only person who understands the
domain. Enterprises don’t want business
users to give a task to a developer and say,
“Look, that’s what I want.” A developer comes
back with a result, and a business user says,
“That’s not exactly what I expected and
now that I see the results, you might want
to do it differently.” The communication
between the business user and developers
or DevOps is inefficient, and the problem is
how the business user fits into the picture.
The business user should be able to describe
what is nowadays called the taxonomy.
That taxonomy describes the metadata
and should be reflected in the database.
Business users should be in control of it.
We realized that the problem is the discrepancy
between the business user and the developers
and IT. And so we are looking into that
discrepancy. The query language will not help
that. In the end, vendors must build tools that
the business user can use, and the technology
is only an enabling technology that helps you
to support that usage. This approach resonates
with business users, because they don’t have to
go through IT to make changes to the schema.
To have a deeper conversation about remapping the database
landscape, please contact:
Gerard Verweij
Principal and US Technology
Consulting Leader
+1 (617) 530 7015
[email protected]
Chris Curran
Chief Technologist
+1 (214) 754 5055
[email protected]
Oliver Halter
Principal, Data and Analytics Practice
+1 (312) 298 6886
[email protected]
Bo Parker
Managing Director
Center for Technology and Innovation
+1 (408) 817 5733
[email protected]
With Hadoop or MongoDB, you don’t know
what’s in the database. You might have an idea,
but you don’t really know. With MongoDB,
someone can do analysis on one collection,
but not on two. Then a developer needs
to treat the data as a join between the two
collections. And the business user is already
out of the picture. Same for Platfora. If your
schema changes and you have different data
formats in your Hadoop ecosystem, you again
need a developer who bridges the gap.
PwC: What’s another approach, then?
MB: The data lake as the common
denominator makes sense because it can
actually maintain consistency. The data
is in one place. Then you have different
microservices on top and the tools to access
the data.
With that approach, you maintain as much
consistency as you need, and business
users define what they need. And so
microservices in that context make sense.
About PwC’s Technology Forecast
Published by PwC’s Center for Technology
and Innovation (CTI), the Technology
Forecast explores emerging technologies
and trends to help business and technology
executives develop strategies to capitalize on
technology opportunities.
Recent issues of the Technology Forecast have
explored a number of emerging technologies
and topics that have ultimately become
many of today’s leading technology and
business issues. To learn more about the
Technology Forecast, visit www.pwc.com/
technologyforecast.
© 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm
is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with
professional advisors. MW-15-1351 LL