Download How enterprise graph databases are maturing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Serializability wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
www.pwc.com/technologyforecast
Technology Forecast: Remapping the database landscape
Issue 1, 2015
How enterprise graph
databases are maturing
Martin Van Ryswyk and Marko Rodriguez of DataStax
explore the challenges and benefits of big data analytics
with graphs.
Interview conducted by Alan Morrison and Bo Parker
PwC: Do customers still think of graph model and a graph model all in the same
databases as a niche technology,
store. That was a common theme we heard.
or are attitudes changing?
PwC: Graph theory is quite
MVR: The reason we acquired Aurelius was
old. What has been inhibiting
very customer driven. We had more than 30
adoption of graph technologies?
Martin Van Ryswyk
Martin Van Ryswyk is an executive
vice president at DataStax.
customers telling us they had graph use cases
they needed to scale. They wanted us to do
something with the Titan graph database
that Aurelius created—either support it
commercially or come up with our own version.
We took a long look and were really surprised at
how mainstream graph databases had become.
The Aurelius team was seeing use cases
in fraud detection and recommendation
engines, evidence that our big enterprise
customers had already identified graph as
the right modeling framework to solve their
problems. These enterprise customers were
really just looking to us to make sure we could
get them an enterprise-grade solution.
Marko Rodriguez
Marko Rodriguez is chief of
engineering and a co-founder
of Aurelius, acquired in February
2015 by DataStax.
MR: It took a long time for people to realize
that many of the data problems they were trying
to solve were graph problems. So although
the theory is relatively old, enterprises just
didn’t have the terminology to understand
what they were getting themselves into or
what their problem was. The graph is actually
a nice way to represent enterprise data and
metadata and to solve enduring data problems.
In addition to the conceptual challenge, graph
technologies lacked a certain level of enterprise
readiness. Take Titan, for example. Aurelius
didn’t have enough resources for enterprise
support and enterprise testing, and that really
hindered adoption for very large customers.
PwC: Are the use cases entirely
What’s nice about DataStax is that now we’re
different from other NoSQL database
able to deliver the outreach that helps overcome
options, such as Cassandra?
the conceptual challenge while also providing
the support our enterprise customers require.
MVR: They’re somewhat adjacent. Titan has
For very large customers with terabytes upon
the ability to use Cassandra underneath it as one
terabytes of data, there is no graph database
way to persist the data. Our customers wanted
that supports their needs right now.
to have both their wide-column database
“In a graph database, every relationship
already acts as a join. That’s why we can
get better scaling.”—Martin Van Ryswyk
PwC: What’s unique about the
graph approach from an enterprise
perspective?
MVR: One of the constraints and benefits of
a graph is that it already has the precomputed
join. In the SQL world you have tables and
columns, and you can arbitrarily join tables
based on various columns. In a graph database,
we would say that this person knows that
person, or this person is related to that person.
It scales nicely in that sense, because every
relationship already acts as a join. That’s why
we can get better scaling with a graph database.
PwC: How can established enterprises
benefit from those advantages?
Our functionality is meant to be accessed
programmatically as part of an application
in an OLTP [online transaction processing]
context. If I am checking out at a grocery
store, the store would want data about me plus
what I just put in the cart. They would need to
run the data through the graph database, so
they could figure out all sorts of information
in near real time about Martin. Let’s say he’s
a football fan, he’s in California, and there’s
a game this week. Maybe we’ll offer him a
beer coupon. I’m making all of that up. But
they’re taking a lot of pieces of data and trying
to make very fast analytic decisions, and
that’s the big thing with DataStax Enterprise
(DSE) Graph. It’s the real-time component.
MVR: We’ve seen a number of good use cases
across different sectors. For example, with
improved relationship analytics, utilities can
predict better when they will have peaks in
usage or equipment failures. Large retailers
can do better targeting for club cards and
coupon recommendations. Banks can detect
more instances of fraud or insider trading.
MR: In the OLTP space Martin is talking about,
when you perform a graph analysis, you’re
just doing a particular traversal for a realtime query. You’re touching only a subset of
the full data set. You’re starting at the Martin
vertex and you’re walking around. You’re
trying to solve a problem. And the less data
you touch, the faster the traversal will be.
PwC: When we think about these
kinds of use cases in relational
technology, we typically look
for querying and reporting
capability. Is that how to think
about graph databases?
But in an OLAP query, you’re typically touching
the whole graph or large subgraphs. There are
multiple threads touching many things and, as
a result, touching the disk heavily. [Retrieving
more data from disk means more latency.] DSE
Graph has both OLTP and OLAP capabilities.
MVR: There will be analysts who will
run queries or who want some really
nice graphics and visualizations out of
graphs. For the most part, that’s not our
target market. That’s the OLAP [online
analytical processing] side of things.
2
PwC Technology Forecast
PwC: Does the OLTP approach help
with the partitioning problem that
graph databases have suffered from?
MR: For sure. That’s the biggest problem
in graphs. It’s impossible to get a perfect cut
across machines, so what you’re trying to do is
limit cross-machine communication.
How enterprise graph databases are maturing
“It’s the God
node problem...
Everything
linked to ‘time,’
and when
everything links
to ‘time,’ there is
no information
in the concept
of time.”
As much as you can put data that will be
co-retrieved on the same machine, the better
off you’ll be. That is typically not a general
function of some abstract algorithm, but
rather a function of understanding your data.
For example, on a social network, people
who communicate with each other tend to be
geographically located close in space. You can
think of your machine as being represented like
a world map; people in the same country will
map to a particular machine.
PwC: What are some other
considerations to take into account
when using graph databases?
MR: A key concern is relationship density.
Although it sounds counterintuitive, you might
actually want to avoid relationship density
as much as possible. Take a shopping site, for
example. You’ve purchased a lot of products
over the years, and so the overall graph is dense
around you.
If you’re doing a query, most of that graph
is irrelevant information. What you’re really
interested in at the current moment is very,
very specific. With graphs, you try to be very
particular and filter, filter, filter to only certain
types of relationships. You really want to
contextualize your traversal, so it meets the
semantics of your ultimate problem.
To have a deeper conversation about remapping the database
landscape, please contact:
Gerard Verweij
Principal and US Technology
Consulting Leader
+1 (617) 530 7015
[email protected]
Chris Curran
Chief Technologist
+1 (214) 754 5055
[email protected]
Oliver Halter
Principal, Data and Analytics Practice
+1 (312) 298 6886
[email protected]
Bo Parker
Managing Director
Center for Technology and Innovation
+1 (408) 817 5733
[email protected]
PwC: Do users struggle with
overly dense graphs?
MR: Yes, they do. It’s the God node problem.
Network science papers and graph theory
papers have examined this problem. For
example, we had a project with a customer
who was parsing arbitrary text. They were
looking at people communicating, and they
were creating links between two words
that both occurred in the same text. We
realized that the word “time” became this
super node. Everything linked to “time,” and
when everything links to “time,” there is no
information in the concept of time.
With too many linkages, there’s no
information. But with no linkages, there
is also no information. You want to have
connectivity, but not too much connectivity.
You really want to have contextualized links
between your nodes and various levels (or
groupings) of nodes, because that will give you
a more accurate representation of the world ­—
where there are structures within structures.
If everything is connected to everything in
every possible way, there is no form and
that is not an accurate representation of the
reality that we share (though at some level of
awareness, it is correct).
About PwC’s Technology Forecast
Published by PwC’s Center for Technology
and Innovation (CTI), the Technology
Forecast explores emerging technologies
and trends to help business and technology
executives develop strategies to capitalize on
technology opportunities.
Recent issues of the Technology Forecast have
explored a number of emerging technologies
and topics that have ultimately become
many of today’s leading technology and
business issues. To learn more about the
Technology Forecast, visit www.pwc.com/
technologyforecast.
© 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm
is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with
professional advisors. MW-15-1351 LL