Download Security at the level of key-value pairs in a NoSQL database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data Protection Act, 2012 wikipedia , lookup

Computer security wikipedia , lookup

Carrier IQ wikipedia , lookup

Computer and network surveillance wikipedia , lookup

Data remanence wikipedia , lookup

Information privacy law wikipedia , lookup

Transcript
www.pwc.com/technologyforecast
Technology Forecast: Remapping the database landscape
Issue 1, 2015
Security at the level
of key-value pairs in
a NoSQL database
Adam Fuchs of Sqrrl describes the benefits
of data-centric security analytics.
Interview conducted by Alan Morrison
PwC: What does Sqrrl do?
Adam Fuchs
Adam Fuchs is CTO of Sqrrl.
AF: We are a big data analytics company
focused on cybersecurity investigations.
We came out of the intelligence community
where we were looking at a huge variety
of big data applications, all of which had
multilevel security concerns. We encountered
a lot of security requirements common to
other industries, such as healthcare—which
has HIPAA [Health Insurance Portability and
Accountability Act] and other restrictions
on data use—or banking—which has
various data privacy requirements, data
sharing agreements, and privacy policies.
All of these things restrict how data can
be used, and it’s a challenge to perform
analysis across many large data sets.
The more data sets, the more complex
the policy becomes in many cases.
So we’re trying to provide an element of datacentric security that still allows our analytics
solution to scale up to the petabyte range and
across thousands of server nodes—and still
allows users to ask a broad variety of questions
on top of it. Given that we have the security
restriction and that we’re trying to scale, we
still want to be able to search, aggregate,
graph, and perform other kinds of analytics.
PwC: And the Sqrrl platform works on
top of a NoSQL wide-column store?
AF: It works with Apache Accumulo, a clone
of BigTable.1 We have a series of layers, some
of which are open source. At the bottom we
have HDFS [the Hadoop Distributed File
System].2 On top of HDFS sits Accumulo.
It’s open source up to that point.
“Across a large data set, there could be hundreds of
trillions of key-value pairs. Each one has a label
that’s derived from the provenance of that data.
That provenance allows us to determine who can
access at query time.”
PwC: Then the Sqrrl platform
sits on top of that?
AF: Then Sqrrl software sits on top of that
and provides linked data analysis capabilities,
which enable analysts to find patterns and
trends hidden in data sets.3 Although we
do a lot of Accumulo development and
we provide some support of Accumulo
in operational instances, our company is
really tailored to sell the Sqrrl product.
PwC: How does the access control
work?
AF: Across a large data set, there could
be hundreds of trillions of key-value
pairs. Each one has a label that’s derived
from the provenance of that data. That
provenance allows us to determine who
can access at query time. We try to make
that security filtering very efficient.
Text search also ties into Accumulo’s key-value
pair.
PwC: And then Sqrrl offers other
modes or data models in addition
to the wide-column mode, yes?
AF: That’s correct. The whole package
leverages a multimodal database. We have
document store capabilities. So we can
do JSON [JavaScript Object Notation]
input and output, and we can dynamically
update documents. We do a form of
online aggregation that is essentially
an aggregated, persistent view, but it’s
supported inside of the key-value store.
2
PwC Technology Forecast
Then there’s the graph structure and the
ability to do graph analytics. The graph
structure and the document structure are
both built on top of that key-value store. And
there’s also our visualization layer that lets
an analyst access these search techniques
with point and click functionality.
PwC: What’s an example of how Sqrrl
might be used inside an enterprise?
AF: Sqrrl is used for cybersecurity
investigations. An investigation could be
preventive, such as when an analyst proactively
examines high-risk users or assets and looks for
suspicious activity associated with them. Or,
an investigation could occur after an incident
and focus on finding the root cause of the
incident. For these types of investigations, Sqrrl
is ingesting very large, disparate cybersecurity
data sets, such as NetFlow, log files, threat
intelligence, e-mails, and even HR information.
Sqrrl fuses this information together under
a common data model, and analysts use
our solution to look for patterns in the data.
However, when we start working with
these different data set we naturally start
running into privacy issues, because some
of these data sets contain sensitive data,
such as personally identifiable information
[PII], financial data, or trade secrets. This is
where Data-centric security comes into play
here, as we can control access to specific
pieces of data in a very fine-grained way.
Security at the level of key-value pairs in a NoSQL database
PwC: How do you support
different user groups?
AF: The users of our tool include both frontline analysts in a security operations center
and more advanced security investigators and
incident handlers. Often, certain sensitive
types of data are not available to the front-line
analysts, but the more advanced investigators
would be able to see all the data.
PwC: What are some of the specific
analytics capabilities?
AF: Accumulo has a pretty abstract interface,
a low-level interface. We have extended
Accumulo to provide more advanced discovery
analytics. Search is in there, and we have
a subset of SQL to do transformations and
aggregations distributed throughout the
cluster, and then some graph analytics to
support subgraph extraction. We also have
added some machine learning capabilities to
help analysts auto-detect specific portions of a
subgraph that are statistically anomalous.
PwC: How does your
visualization work?
AF: Sqrrl’s primary visualization organizes
data into connected nodes and edges via a
linked data property graph. This visualization
technique goes beyond basic histograms and
bar charts and aims to present data with high
dimensionality in a compact way. Using linked
data diagrams, an analyst can quickly assess
what are important clusters of data to focus on.
1 BigTable, a distributed wide-column or sparse matrix database used internally at Google since 2004, is now part of the Google Cloud
Datastore, which is part of the Google Cloud Platform. See “How NoSQL key-value and wide-column stores make in-image advertising
possible,” PwC Technology Forecast 2015, Issue 1, http://www.pwc.com/nosql, for more information.
2 See “Data lakes and the promise of unsiloed data,” PwC Technology Forecast 2014, Issue 1, http://www.pwc.com/us/en/technologyforecast/2014/cloud-computing/features/data-lakes.jhtml, for more information on Hadoop and HDFS.
3 See “Semantic Web in the enterprise,” PwC Technology Forecast, Spring 2009, http://www.pwc.com/us/en/technology-forecast/
spring2009/index.jhtml, for a detailed discussion of linked data, RDF, and other semantic web standards and use cases from an enterprise
perspective.
To have a deeper conversation about remapping the database
landscape, please contact:
Gerard Verweij
Principal and US Technology
Consulting Leader
+1 (617) 530 7015
[email protected]
Chris Curran
Chief Technologist
+1 (214) 754 5055
[email protected]
Oliver Halter
Principal, Data and Analytics Practice
+1 (312) 298 6886
[email protected]
Bo Parker
Managing Director
Center for Technology and Innovation
+1 (408) 817 5733
[email protected]
About PwC’s Technology Forecast
Published by PwC’s Center for Technology
and Innovation (CTI), the Technology
Forecast explores emerging technologies
and trends to help business and technology
executives develop strategies to capitalize on
technology opportunities.
Recent issues of the Technology Forecast have
explored a number of emerging technologies
and topics that have ultimately become
many of today’s leading technology and
business issues. To learn more about the
Technology Forecast, visit www.pwc.com/
technologyforecast.
© 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm
is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with
professional advisors. MW-15-1351 LL