Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
www.pwc.com/technologyforecast Technology Forecast: Remapping the database landscape Issue 1, 2015 Security at the level of key-value pairs in a NoSQL database Adam Fuchs of Sqrrl describes the benefits of data-centric security analytics. Interview conducted by Alan Morrison PwC: What does Sqrrl do? Adam Fuchs Adam Fuchs is CTO of Sqrrl. AF: We are a big data analytics company focused on cybersecurity investigations. We came out of the intelligence community where we were looking at a huge variety of big data applications, all of which had multilevel security concerns. We encountered a lot of security requirements common to other industries, such as healthcare—which has HIPAA [Health Insurance Portability and Accountability Act] and other restrictions on data use—or banking—which has various data privacy requirements, data sharing agreements, and privacy policies. All of these things restrict how data can be used, and it’s a challenge to perform analysis across many large data sets. The more data sets, the more complex the policy becomes in many cases. So we’re trying to provide an element of datacentric security that still allows our analytics solution to scale up to the petabyte range and across thousands of server nodes—and still allows users to ask a broad variety of questions on top of it. Given that we have the security restriction and that we’re trying to scale, we still want to be able to search, aggregate, graph, and perform other kinds of analytics. PwC: And the Sqrrl platform works on top of a NoSQL wide-column store? AF: It works with Apache Accumulo, a clone of BigTable.1 We have a series of layers, some of which are open source. At the bottom we have HDFS [the Hadoop Distributed File System].2 On top of HDFS sits Accumulo. It’s open source up to that point. “Across a large data set, there could be hundreds of trillions of key-value pairs. Each one has a label that’s derived from the provenance of that data. That provenance allows us to determine who can access at query time.” PwC: Then the Sqrrl platform sits on top of that? AF: Then Sqrrl software sits on top of that and provides linked data analysis capabilities, which enable analysts to find patterns and trends hidden in data sets.3 Although we do a lot of Accumulo development and we provide some support of Accumulo in operational instances, our company is really tailored to sell the Sqrrl product. PwC: How does the access control work? AF: Across a large data set, there could be hundreds of trillions of key-value pairs. Each one has a label that’s derived from the provenance of that data. That provenance allows us to determine who can access at query time. We try to make that security filtering very efficient. Text search also ties into Accumulo’s key-value pair. PwC: And then Sqrrl offers other modes or data models in addition to the wide-column mode, yes? AF: That’s correct. The whole package leverages a multimodal database. We have document store capabilities. So we can do JSON [JavaScript Object Notation] input and output, and we can dynamically update documents. We do a form of online aggregation that is essentially an aggregated, persistent view, but it’s supported inside of the key-value store. 2 PwC Technology Forecast Then there’s the graph structure and the ability to do graph analytics. The graph structure and the document structure are both built on top of that key-value store. And there’s also our visualization layer that lets an analyst access these search techniques with point and click functionality. PwC: What’s an example of how Sqrrl might be used inside an enterprise? AF: Sqrrl is used for cybersecurity investigations. An investigation could be preventive, such as when an analyst proactively examines high-risk users or assets and looks for suspicious activity associated with them. Or, an investigation could occur after an incident and focus on finding the root cause of the incident. For these types of investigations, Sqrrl is ingesting very large, disparate cybersecurity data sets, such as NetFlow, log files, threat intelligence, e-mails, and even HR information. Sqrrl fuses this information together under a common data model, and analysts use our solution to look for patterns in the data. However, when we start working with these different data set we naturally start running into privacy issues, because some of these data sets contain sensitive data, such as personally identifiable information [PII], financial data, or trade secrets. This is where Data-centric security comes into play here, as we can control access to specific pieces of data in a very fine-grained way. Security at the level of key-value pairs in a NoSQL database PwC: How do you support different user groups? AF: The users of our tool include both frontline analysts in a security operations center and more advanced security investigators and incident handlers. Often, certain sensitive types of data are not available to the front-line analysts, but the more advanced investigators would be able to see all the data. PwC: What are some of the specific analytics capabilities? AF: Accumulo has a pretty abstract interface, a low-level interface. We have extended Accumulo to provide more advanced discovery analytics. Search is in there, and we have a subset of SQL to do transformations and aggregations distributed throughout the cluster, and then some graph analytics to support subgraph extraction. We also have added some machine learning capabilities to help analysts auto-detect specific portions of a subgraph that are statistically anomalous. PwC: How does your visualization work? AF: Sqrrl’s primary visualization organizes data into connected nodes and edges via a linked data property graph. This visualization technique goes beyond basic histograms and bar charts and aims to present data with high dimensionality in a compact way. Using linked data diagrams, an analyst can quickly assess what are important clusters of data to focus on. 1 BigTable, a distributed wide-column or sparse matrix database used internally at Google since 2004, is now part of the Google Cloud Datastore, which is part of the Google Cloud Platform. See “How NoSQL key-value and wide-column stores make in-image advertising possible,” PwC Technology Forecast 2015, Issue 1, http://www.pwc.com/nosql, for more information. 2 See “Data lakes and the promise of unsiloed data,” PwC Technology Forecast 2014, Issue 1, http://www.pwc.com/us/en/technologyforecast/2014/cloud-computing/features/data-lakes.jhtml, for more information on Hadoop and HDFS. 3 See “Semantic Web in the enterprise,” PwC Technology Forecast, Spring 2009, http://www.pwc.com/us/en/technology-forecast/ spring2009/index.jhtml, for a detailed discussion of linked data, RDF, and other semantic web standards and use cases from an enterprise perspective. To have a deeper conversation about remapping the database landscape, please contact: Gerard Verweij Principal and US Technology Consulting Leader +1 (617) 530 7015 [email protected] Chris Curran Chief Technologist +1 (214) 754 5055 [email protected] Oliver Halter Principal, Data and Analytics Practice +1 (312) 298 6886 [email protected] Bo Parker Managing Director Center for Technology and Innovation +1 (408) 817 5733 [email protected] About PwC’s Technology Forecast Published by PwC’s Center for Technology and Innovation (CTI), the Technology Forecast explores emerging technologies and trends to help business and technology executives develop strategies to capitalize on technology opportunities. Recent issues of the Technology Forecast have explored a number of emerging technologies and topics that have ultimately become many of today’s leading technology and business issues. To learn more about the Technology Forecast, visit www.pwc.com/ technologyforecast. © 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with professional advisors. MW-15-1351 LL