Download Filling in the gaps in NoSQL document stores and data lakes

www.pwc.com/technologyforecast Technology Forecast: Remapping the database landscape Issue 1, 2015 Filling in the gaps in NoSQL document stores and data lakes Matthias Brantner describes the role database virtualization and a business-user query interface can play in heterogeneous environments. Interview conducted by Alan Morrison and Bo Parker PwC: How are companies using NoSQL or non-relational databases? MB: Developers use NoSQL databases because those databases are relatively easy to set up and they’re mostly free to get started. NoSQL databases are a no-brainer for developers to get something up and running quickly. MongoDB is an example—it’s very easy to install, and developers jumped on it. Matthias Brantner Matthias Brantner is the former CTO of 28msec and is a consulting member of technical staff at Oracle. This interview was conducted in 2014. But it’s not really clear to me how these databases will be used in the mainstream enterprise and what they will be used for. You can certainly analyze long streams of data from your websites or all the clickstreams. But, as things currently stand, eventually you need a developer to help you deal with the flexibility that those databases give you. You can essentially put everything in there, but to figure out what is in the database, you need a developer. The tools haven’t caught up. That’s the next thing that needs to happen to drive adoption of those systems and to make the data in data lakes or enterprise data hubs accessible by business users. PwC: NoSQL query languages are still scarce. MB: That’s right. Those NoSQL databases started without really having query languages on top of them. Some of them are just key-value stores, and you must write a program to get the data out. MongoDB, for example, has a slightly more sophisticated query language, but it’s still very developer focused. Now many vendors are bringing SQL back onto NoSQL or Hadoop. But generally the semantics are different because NoSQL and Hadoop do not use the SQL data model. So you really cannot use SQL semantics. “Just because you have an API doesn’t mean you don’t need to maintain the database anymore.” Everyone who’s trying to serve the market is currently cooking in their own kitchen. Vendors need to focus not only on developer ease of use but also on the ability of business users to look at the data. The technology makes sense, but just collecting data is expensive and doesn’t make sense. You need to know why you are collecting the data and what you’ll do with it. PwC: One organization we’ve heard from seems to have a solid BI [business intelligence] group and said they’re able to integrate their data from NoSQL and relational sources fairly quickly, more quickly than they can via APIs [application programming interfaces] and application-style integration. PwC: How can executives reduce reliance on IT when it comes to databases generally? MB: Let me give you an example of what we experienced in the disclosure management fields with XBRL. In this case, business users are trying to get relational database systems to explore the information coming from the XBRL filings. The problem is that a lot of dimensional metadata is in XBRL filings. And in relational databases, those dimensions are part of the database schema. So your database schema encapsulates the dimension. PwC: Many developers nowadays keep their logic in the app itself. Is 28msec putting the logic into the database? 2 PwC Technology Forecast PwC: Presumably you could do that with a relational database also. MB: Yes, you could do that as well. I think the problem with the relational database is that you must migrate your schema completely if you make modifications, which is very hard: You mostly need IT, the person developing the application cannot migrate the schema, and the modifications might have great performance impacts on other aspects of the system. So the use of a relational database should be considered very carefully. MB: Absolutely. Just because you have an API doesn’t mean you don’t need to maintain the database anymore. If you start having a lot of data silos, each with its own API, you’re gathering a lot of technical debt with lots of code for APIs on different data stores. Each data store does a very specific or very small thing, but only one or two developers might understand it and they need to continue maintaining it. And if new demands for the data come in, you’re gathering more and more code that you must maintain. Companies can have many different data repositories but not really understand how they contribute to the big picture. MB: Yes. In their virtual database approach, the database and the application server are one thing, and you can write your entire application in the declarative JSONiq language, which allows you to read your own writes and make calls to the outside world. The only thing you expose is an API. You essentially have one thing, which is the database/application server—and it’s using one language that directs both the application logic and the data management. Dimensions often change, and only the business user understands the semantics of the dimensions and can add, modify, or remove them. The business user must talk to IT for every change. That certainly doesn’t make sense, and I think this barrier is the one we need to break. Filling in the gaps in NoSQL document stores and data lakes A business user should be in control of the metadata, of the schema, because a business user is the only person who understands the domain. Enterprises don’t want business users to give a task to a developer and say, “Look, that’s what I want.” A developer comes back with a result, and a business user says, “That’s not exactly what I expected and now that I see the results, you might want to do it differently.” The communication between the business user and developers or DevOps is inefficient, and the problem is how the business user fits into the picture. The business user should be able to describe what is nowadays called the taxonomy. That taxonomy describes the metadata and should be reflected in the database. Business users should be in control of it. We realized that the problem is the discrepancy between the business user and the developers and IT. And so we are looking into that discrepancy. The query language will not help that. In the end, vendors must build tools that the business user can use, and the technology is only an enabling technology that helps you to support that usage. This approach resonates with business users, because they don’t have to go through IT to make changes to the schema. To have a deeper conversation about remapping the database landscape, please contact: Gerard Verweij Principal and US Technology Consulting Leader +1 (617) 530 7015 [email protected] Chris Curran Chief Technologist +1 (214) 754 5055 [email protected] Oliver Halter Principal, Data and Analytics Practice +1 (312) 298 6886 [email protected] Bo Parker Managing Director Center for Technology and Innovation +1 (408) 817 5733 [email protected] With Hadoop or MongoDB, you don’t know what’s in the database. You might have an idea, but you don’t really know. With MongoDB, someone can do analysis on one collection, but not on two. Then a developer needs to treat the data as a join between the two collections. And the business user is already out of the picture. Same for Platfora. If your schema changes and you have different data formats in your Hadoop ecosystem, you again need a developer who bridges the gap. PwC: What’s another approach, then? MB: The data lake as the common denominator makes sense because it can actually maintain consistency. The data is in one place. Then you have different microservices on top and the tools to access the data. With that approach, you maintain as much consistency as you need, and business users define what they need. And so microservices in that context make sense. About PwC’s Technology Forecast Published by PwC’s Center for Technology and Innovation (CTI), the Technology Forecast explores emerging technologies and trends to help business and technology executives develop strategies to capitalize on technology opportunities. Recent issues of the Technology Forecast have explored a number of emerging technologies and topics that have ultimately become many of today’s leading technology and business issues. To learn more about the Technology Forecast, visit www.pwc.com/ technologyforecast. © 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with professional advisors. MW-15-1351 LL

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Filling in the gaps in NoSQL document stores and data lakes