Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IBM Cloudant Glynn Bird & Mike Broberg May 2015 Beyond RDBMS: a rough guide to NoSQL databases © 2015 IBM Corporation Housekeeping Notes ▪ Today’s webinar is being recorded. We will send you a link to the recording and copy of the slide deck after the presentation. ▪ The webinar recording will be available on our website: http://cloudant.com ▪ If you would like to ask a question during today’s presentation, please type in your question using the GoTo Meeting toolbar. IBM Cloudant © 2015 IBM Corporation Introductions Glynn Bird ▪ Developer Advocate Manager @ IBM Cloudant UK ▪ Previously worked ▪ @ Central Index creating business directory websites and CRM systems ▪ For the steel industry in R&D developing control and instrumentation technology IBM Cloudant © 2015 IBM Corporation IBM Cloudant Glynn Bird May 2015 Beyond RDBMS: a rough guide to NoSQL databases © 2015 IBM Corporation Issues with Relational DBS ▪ Great solution for a lot of use cases ▪ Transactions are neat, but have scaling issues with many clients ▪ Bottleneck/single point of failure issues − Availability needed more than relational algebra ▪ Vertical scaling may not be possible − Pricing of closed source solutions IBM Cloudant © 2015 IBM Corporation Origins of NoSQL ▪ ▪ ▪ ▪ “No SQL”, “Not only SQL” Early 2000s Response to a new set of use cases not well suited to relational model Precursor to “Big Data” IBM Cloudant © 2015 IBM Corporation Foundational Concepts ▪ Google - MapReduce (2004) ▪ Amazon - Dynamo (2007) ▪ Both have use cases not well served by traditional relational stores − Volume of data - petabytes − Number of concurrent clients - millions − Must scale horizontally IBM Cloudant © 2015 IBM Corporation Implementations ▪ CAP Theorem − Eric Brewer’s conjecture (2000) − Consistency, availability, partition tolerance - pick 2 − In a distributed system, partition tolerance is usually required − Partition tolerance is not just network failure IBM Cloudant © 2015 IBM Corporation Types of Non-Relational Store ▪ Key:Value store − Store arbitrary values assigned to a key − Query store by key − Schemaless ▪ Document store − Store semi-structured data, including a key − Document is atomic − Define indexes over document content IBM Cloudant © 2015 IBM Corporation Types of Non-Relational Store ▪ Graph − Store nodes and relationships between them − Think social network − Query distance between nodes in the graph ▪ Search − Often one of the first things taken out of a relational DB • SELECT * from TABLE WHERE name LIKE “Georg*” just doesn’t cut it ▪ Efficiently store and analyse textual data for human query ▪ Ad-hoc queries must be performant IBM Cloudant © 2015 IBM Corporation Consistency ▪ Do all your servers in a data centre need to agree? ▪ What about across data centres in multiple locations? − Answer: probably not. ▪ Embrace eventual consistency IBM Cloudant © 2015 IBM Corporation Availability ▪ Users expect their web & mobile apps to always be available − Database surgery measured in hours is unacceptable for the Twitter generation ▪ Customer base grown beyond one timezone - always on ▪ Operational considerations − Fail fast, no snowflakes, data replication − Someone’s got to be on the pager IBM Cloudant © 2015 IBM Corporation Durability ▪ Can you lose data? − Not a silly question - e.g. log data vs audit trails ▪ How long is data safe in RAM? ▪ How much? − 10%? 1%? 0.001%? ▪ Cache vs Store ▪ Understand failure modes IBM Cloudant © 2015 IBM Corporation Concurrency ▪ Concurrent Connections: Your DB's Heart Attack ▪ Modern networked applications significantly interact with the database ▪ Database becomes the bottleneck for application servers − Traditionally introduce caching − Implicitly loose consistency; so why use a consistent store? ▪ Horizontal scale for concurrency IBM Cloudant © 2015 IBM Corporation Understanding use cases ▪ What types of questions do you need to ask your database and how long can you wait for answers? ▪ What choice did you make around CAP and what are your durability needs? ▪ Does all your data fit in RAM? − You can have a TB of RAM in a box if your wallet is big enough ▪ Do you want to scale horizontally or vertically? − Cost per node vs operational cost IBM Cloudant © 2015 IBM Corporation DIY vs Utility vs Managed ▪ DIY gives you more control but also you need build the team to run the system ▪ Hosted services provide a database utility for you to build on and integrate with other utilities ▪ Managed service takes care of running (parts of) the system for you IBM Cloudant © 2015 IBM Corporation Where does Cloudant fit in? ▪ AP - from CAP theorem - eventually consistent ▪ document database - JSON documents ▪ distributed - using Dynamo ring ▪ resilient - data written to disk multiple times ▪ querying mechanisms: ▪ MongoDB-style query language ▪ incremental MapReduce ▪ Lucene free-text search ▪ GeoSpatial querying for GeoJSON stores ▪ “as a service” ▪ Multi-tenant - with free and PAYG tiers ▪ Dedicated - fully-managed dedicated hardware in the cloud of your choice ▪ Local - onpremise solution ▪ replication for offline-first mobile applications on iOS, Android or HTML5 platforms ▪ Cloudant is proud contributor to the Apache CouchDB project IBM Cloudant © 2015 IBM Corporation Questions [email protected] @glynn_bird IBM Cloudant © 2015 IBM Corporation