Download Beyond RDBMS: a rough guide to NoSQL databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Versant Object Database wikipedia , lookup

Concurrency control wikipedia , lookup

Data vault modeling wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

IBM AIX wikipedia , lookup

IBM Notes wikipedia , lookup

Transcript
IBM Cloudant
Glynn Bird & Mike Broberg
May 2015
Beyond RDBMS:
a rough guide to NoSQL databases
© 2015 IBM Corporation
Housekeeping Notes
▪ Today’s webinar is being recorded. We will send you a link to the recording and copy of the
slide deck after the presentation.
▪ The webinar recording will be available on our website: http://cloudant.com
▪ If you would like to ask a question during today’s presentation, please type in your question
using the GoTo Meeting toolbar.
IBM Cloudant
© 2015 IBM Corporation
Introductions
Glynn Bird
▪ Developer Advocate Manager
@ IBM Cloudant UK
▪ Previously worked
▪ @ Central Index creating business
directory websites and CRM systems
▪ For the steel industry in R&D
developing control and instrumentation
technology
IBM Cloudant
© 2015 IBM Corporation
IBM Cloudant
Glynn Bird
May 2015
Beyond RDBMS:
a rough guide to NoSQL databases
© 2015 IBM Corporation
Issues with Relational DBS
▪ Great solution for a lot of use cases
▪ Transactions are neat, but have scaling issues with many clients
▪ Bottleneck/single point of failure issues
− Availability needed more than relational algebra
▪ Vertical scaling may not be possible
− Pricing of closed source solutions
IBM Cloudant
© 2015 IBM Corporation
Origins of NoSQL
▪
▪
▪
▪
“No SQL”, “Not only SQL”
Early 2000s
Response to a new set of use cases not well suited to relational model
Precursor to “Big Data”
IBM Cloudant
© 2015 IBM Corporation
Foundational Concepts
▪ Google - MapReduce (2004)
▪ Amazon - Dynamo (2007)
▪ Both have use cases not well served by traditional relational stores
− Volume of data - petabytes
− Number of concurrent clients - millions
− Must scale horizontally
IBM Cloudant
© 2015 IBM Corporation
Implementations
▪ CAP Theorem
− Eric Brewer’s conjecture (2000)
− Consistency, availability, partition tolerance - pick 2
− In a distributed system, partition tolerance is usually required
− Partition tolerance is not just network failure
IBM Cloudant
© 2015 IBM Corporation
Types of Non-Relational Store
▪ Key:Value store
− Store arbitrary values assigned to a key
− Query store by key
− Schemaless
▪ Document store
− Store semi-structured data, including a key
− Document is atomic
− Define indexes over document content
IBM Cloudant
© 2015 IBM Corporation
Types of Non-Relational Store
▪ Graph
− Store nodes and relationships between them
− Think social network
− Query distance between nodes in the graph
▪ Search
− Often one of the first things taken out of a relational DB
• SELECT * from TABLE WHERE name LIKE “Georg*” just doesn’t cut it
▪ Efficiently store and analyse textual data for human query
▪ Ad-hoc queries must be performant
IBM Cloudant
© 2015 IBM Corporation
Consistency
▪ Do all your servers in a data centre need to agree?
▪ What about across data centres in multiple locations?
− Answer: probably not.
▪ Embrace eventual consistency
IBM Cloudant
© 2015 IBM Corporation
Availability
▪ Users expect their web & mobile apps to always be available
− Database surgery measured in hours is unacceptable for the Twitter generation
▪ Customer base grown beyond one timezone - always on
▪ Operational considerations
− Fail fast, no snowflakes, data replication
− Someone’s got to be on the pager
IBM Cloudant
© 2015 IBM Corporation
Durability
▪ Can you lose data?
− Not a silly question - e.g. log data vs audit trails
▪ How long is data safe in RAM?
▪ How much?
− 10%? 1%? 0.001%?
▪ Cache vs Store
▪ Understand failure modes
IBM Cloudant
© 2015 IBM Corporation
Concurrency
▪ Concurrent Connections: Your DB's Heart Attack
▪ Modern networked applications significantly interact with the database
▪ Database becomes the bottleneck for application servers
− Traditionally introduce caching
− Implicitly loose consistency; so why use a consistent store?
▪ Horizontal scale for concurrency
IBM Cloudant
© 2015 IBM Corporation
Understanding use cases
▪ What types of questions do you need to ask your database and how long can you wait for
answers?
▪ What choice did you make around CAP and what are your durability needs?
▪ Does all your data fit in RAM?
− You can have a TB of RAM in a box if your wallet is big enough
▪ Do you want to scale horizontally or vertically?
− Cost per node vs operational cost
IBM Cloudant
© 2015 IBM Corporation
DIY vs Utility vs Managed
▪ DIY gives you more control but also you need build the team to run the system
▪ Hosted services provide a database utility for you to build on and integrate with other utilities
▪ Managed service takes care of running (parts of) the system for you
IBM Cloudant
© 2015 IBM Corporation
Where does Cloudant fit in?
▪ AP - from CAP theorem - eventually consistent ▪ document database - JSON documents
▪ distributed - using Dynamo ring
▪ resilient - data written to disk multiple times
▪ querying mechanisms:
▪ MongoDB-style query language
▪ incremental MapReduce
▪ Lucene free-text search
▪ GeoSpatial querying for GeoJSON stores
▪ “as a service”
▪ Multi-tenant - with free and PAYG tiers
▪ Dedicated - fully-managed dedicated hardware in the cloud of your choice ▪ Local - onpremise solution
▪ replication for offline-first mobile applications on iOS, Android or HTML5 platforms ▪
Cloudant is proud contributor to the Apache CouchDB project
IBM Cloudant
© 2015 IBM Corporation
Questions
[email protected]
@glynn_bird
IBM Cloudant
© 2015 IBM Corporation