Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
www.pwc.com/technologyforecast Technology Forecast: Remapping the database landscape Issue 1, 2015 Scaling online ad innovations with the help of a NoSQL widecolumn database Vaibhav Puranik and Ken Weiner of GumGum discuss the challenges and benefits of open source databases for in-image advertising. Interview conducted by Alan Morrison, Bo Parker, and Tom Foth PwC: What does GumGum do? KW: GumGum sells advertising via its in-image ad platform to brand advertisers in the Fortune 500. In-image advertising is a hybrid between display and native advertising; it’s a way to overlay ads on a photo or an image. These ads are usually contextually targeted to complement the images. Vaibhav Puranik Vaibhav Puranik is director of engineering, big data at GumGum. GumGum works with a few thousand publishers, and that’s how we secure our digital inventory. We’re basically able to sell ad impressions on images to different advertisers and agencies. PwC: How are NoSQL databases important to your business? KW: For GumGum to target ads properly, we need an understanding of all of the photos and images that we see on websites and of all of the pages that those photos fit on. We also need some anonymous targeting data that Ken Weiner Ken Weiner is CTO of GumGum. we might associate with all the users who look at all those photos and images. So we need a large database to look up information that we’ve already computed about each photo, each page, and each user. Our ad server uses that information in real time to make decisions about which ads to serve. VP: To give an example, a photograph of actress Jennifer Lawrence might appear on a particular web page. Our software recognizes automatically that this is Jennifer Lawrence’s photograph. Once it does, we can display a trailer ad for The Hunger Games on that photograph. PwC: And you store that information in the NoSQL database? VP: In an Apache Cassandra database,1 we save the information that this particular photo is of Jennifer Lawrence. And then we can use that information in real time to serve the ads. “Low latency is very important to any advertising, because a user’s attention is fleeting. If the ad isn’t served really close in time to when the image appears, it may never be seen.” —Ken Weiner PwC: Is latency another factor? KW: Yes, it’s definitely a factor. Low latency is very important to any advertising, because a user’s attention is fleeting. If the ad isn’t served really close in time to when the image appears, it may never be seen. So we must select and show ads to users in as little time as possible. GumGum also participates in real-time bidding integrations with other companies, where we have only milliseconds to make decisions and to figure out what ad we’ll serve. PwC: What was the challenge GumGum faced that caused you to move to a NoSQL database such as Cassandra? VP: In 2013, we were using another NoSQL database called HBase. HBase uses the Hadoop Distributed File System [HDFS]2 and ZooKeeper. HBase runs multiple processes on a node [region server], so whenever there was a problem, we didn’t know whether the HBase processes, the Hadoop processes, or something else caused the problem. To maintain HBase, you must maintain three or four pieces of software together, whereas with Cassandra, we have just one simple process running on every single node. 2 PwC Technology Forecast PwC: How do you query the data in Cassandra? VP: We have apps that would query the data programmatically. For ad hoc purposes, we use a tool called Presto, which allows us to write SQL [structured query language] queries. PwC: Are you also looking at in-memory databases? VP: One other thing we are looking into is how we could use Apache Spark in conjunction with Cassandra. Spark would allow ad hoc querying on top of Cassandra. Spark can load Cassandra data into memory and then execute really, really fast queries on top of it. Because Spark can work in memory, it can perform 100 times faster. Spark can also provide a query processing engine for Cassandra. PwC: Does Cassandra come with an in-memory capability to begin with? VP: Cassandra does come with in-memory capability in its enterprise version. Unfortunately, we are not using that enterprise version right now, but rather the Apache license version of Cassandra. I know people who are using the enterprise version, and they’re pretty happy with it. Scaling online ad innovations with the help of a NoSQL wide-column database PwC: If you went back in time 10 years and you didn’t have access to these NoSQL options, what would you have done? How dependent are you on the new big data technologies just to execute your business models? KW: I think it might have been possible to do 10 years ago, but it wasn’t as costeffective. There were solutions back then for big, vertically powered databases, and you could get a really powerful, expensive, single machine. But beyond a certain point, I’m not sure exactly how that would have worked out. VP: The reason data is growing so fast is that you can store it and process it in cheaper ways. Ten years ago, most companies would not process that much data because the cost of processing that data was too high. And now they are processing much more data, because they can do it less expensively. 1 Apache Cassandra is a wide-column NoSQL database. For more on key-value and wide-column stores, see“How NoSQL key-value and wide-column stores help manage data in high-volume environments,” PwC Technology Forecast 2015, Issue 1, http://www.pwc.com/ nosql. 2 See “Data lakes and the promise of unsiloed data,” PwC Technology Forecast 2014, Issue 1, http://www.pwc.com/us/en/technologyforecast/2014/cloud-computing/features/data-lakes.jhtml, for more information on Hadoop and HDFS. To have a deeper conversation about remapping the database landscape, please contact: Gerard Verweij Principal and US Technology Consulting Leader +1 (617) 530 7015 [email protected] Chris Curran Chief Technologist +1 (214) 754 5055 [email protected] Oliver Halter Principal, Data and Analytics Practice +1 (312) 298 6886 [email protected] Bo Parker Managing Director Center for Technology and Innovation +1 (408) 817 5733 [email protected] About PwC’s Technology Forecast Published by PwC’s Center for Technology and Innovation (CTI), the Technology Forecast explores emerging technologies and trends to help business and technology executives develop strategies to capitalize on technology opportunities. Recent issues of the Technology Forecast have explored a number of emerging technologies and topics that have ultimately become many of today’s leading technology and business issues. To learn more about the Technology Forecast, visit www.pwc.com/ technologyforecast. © 2015 PricewaterhouseCoopers LLP, a Delaware limited liability partnership. All rights reserved. PwC refers to the US member firm, and may sometimes refer to the PwC network. Each member firm is a separate legal entity. Please see www.pwc.com/structure for further details. This content is for general information purposes only, and should not be used as a substitute for consultation with professional advisors. MW-15-1351 LL