Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University OUTLINE • NoSQL Definition • Motivation • Data Store Introduction -- Keu-value Stores -- Document Stores -- Extensible Record Stores -- New Relational Database • Conclusion NoSQL: The Name • “SQL” = Traditional Relation DBMS. • Experience teach us: Not every data management/analysis problem is best solved using a traditional relation DBMS. • “NoSQL”=“No SQL”= Not using traditional Relation DBMS • “No SQL” ≠ Don’t use SQL language NoSQL: The Name Not every data management/analysis problem is best solved using a traditional relation DBMS. • “NoSQL”=“Not only use SQL” RDMS Data management system(DBMS) provides • Convenient • Multi-user • Safe • Persistent • Reliable • Massive • Efficient RDMS Web apps have different needs(than the apps that RDBMS were designed for) --Low and predictable response time(latency) --Scalability & elasticity(at low cost) --High availability --Flexible schemas/ semi-structured data --Geographic distribution (multiple datacenters) Web apps can(usually) do without --Transaction/ Strong Consistency/ integrity --Complex queries NoSQL System No declarative query language– more programming Relaxed consistency—fewer guarantees NoSQL System The idea behind the NoSQL: Giving up ACID constraints, one can achieve much higher performance and scalability. ACID= Atomicity, Consistency, Isolation, and Durability BASE=Basically Available, soft state, Eventually consistent. CAP Theorem • A system can have only two out of three of the following properties:consistency, availability, and partition-tolerance. New relational DBMS • The SQL systems provide horizontal scalability without abandoning SQL and ACID transactions. Types of NoSQL Databases Objective: • Understand/compare each type of NoSQL database • Discuss 1-2 NoSQL database in each family Systems Beyond our Scope Some authors have used a broad definition of NoSQL, including any DB system that is not relational: • Graph database systems • Object-oriented database systems • Distributed object-oriented stores • Data-warehousing database systems - complex queries - read-only or read-mostly ACID Types of NoSQL Databases Key-value stores Document stores Extensible record stores Types of NoSQL Databases NoSQL systems generally have six key features: 1. the ability to horizontally scale "simple operation" throughput over many servers 2. the ability to replicate and distribute (partition) data over many servers Types of NoSQL Databases 3. a simple call level interface or protocol (in contrast to a SQL binding) 4. a weaker concurrency model than ACID transactions of most relational (SQL) database systems (BASE) Types of NoSQL Databases 5. efficient use of distributed indexes and RAM for data storage, and 6. the ability to dynamically add new attributes to data records Types of NoSQL Databases • NoSQL systems differ mainly in their data model • Specific implementations differ in the persistent mechanism and additional functionalities: Replication Versioning Locking Transactions etc.. Key-Value Stores • Global Collection of Key/Value Pairs • Inspired by Amazon’s Dynamo and Distributed Hashtables •Operations •void Put(string key, byte[] data); •byte[] Get(string key); •void Remove(string key); Key-Value Stores: Examples Project Voldemort • Advanced key-value store • Created by LinkedIn, now open source • Written in Java • Provides MVCC • Asynchronous replication • Sharding + Consistent Hashing • Automatic failure detection and recovery Project Voldemort Operations: value = store.get(key) store.put(key, value) store.delete(key) Pros? & Cons? Document Stores: Document? • What is a document? Semi-structured data Encapsulates and encodes data (or information) in some standard formats or encodings Encodings: • • • • • XML YAML JSON BSON Binary forms: PDF, Microsoft Office documents.. etc. Document Stores: Document? • Documents are like rows or records in relational databases, BUT Schema Row Document No Schema FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing" FirstName:"Jonathan", Address:"15 Wanamassa Point Road", Children:[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}] Document Stores • Similar to Key-value stores but with a major differences, value is a document generally support secondary indexes • Flexible schema Any number of fields can be added Multiple types of documents (objects) and nested documents or lists • Documents stored in JSON or Binary JSON (BSON) • No ACID property Document Stores: Examples TERRASTORE by Google CouchDB • Apache project since 2008 • Schema free, document oriented database Documents are stored in JSON format Support secondary indexes B-tree storage engine MVCC model, no locking No joins, no PK/FK • Incremental replication CouchDB • REST API CRUD HTTP Params Create PUT /db/docid Read GET /db/docid Update POST /db/docid Delete DELETE /db/docid • Libraries for various languages that convert native API calls into the RESTful calls Java, C, PHP, etc. CouchDB: Views • Views Filter, sort, “join”, aggregate, report Map/Reduce based K/V pairs from Map/Reduce are also stored in the B-tree engine Built on demand Can be materialized & incrementally updated CouchDB: Views CouchDB: Local Consistency • CouchDB uses Multi-Version Concurrency Control (MVCC) CouchDB: “Global” Consistency • Incremental Replication Extensible record stores • Extensible record stores also called column sotres. Each key is associated with multiple attributes(i.e. columns) Hybrid row/column stores Inspired Google BigTable Example: HBase, Cassandra Column: HBase Based on Google’s BigTable Apache Project TLP Cloudera (certification, EC2 AMI’s, etc.) Layered over HDFS (Hadoop Distributed File System). Input/Output for MapReduce Jobs APIs ---Thrift, REST Column: HBase Automatic Partitioning Automatic re-balancing/re-partitioning Fault tolerant --HDFS ---Multiple Replicates Highly distributed Column: HBase Column: Cassandra Create at facebook for Inbox search Facebook Google Code ASF Commercial Support available from Riptano Features taken from both Dynamo and Big Table -- Dynamo – Consistent hashing, Partitioning, Replication -- Big Table- Column Familes, MemTables, SSTables Column: Cassandra Symmetric nodes -- No single point of failure -- Linearly scalable -- Ease of administration Flexible/Automated Provisioning Flexible Replica Replacement High Availability -- Eventually Consistency -- However, consistency is tuneable Column: Cassandra Partitioning --Random ----Good distribution of data between nodes ---- Range scans not possible --Order preserving ---can lead to unbalanced nodes --- Range scans, Natural Order Extremely fast reads/writes (low latency) Thrift API Column: Cassandra Column -- Basic unit of storage Column Family --Collection of like records --Record level atomicity - indexed Keyspace --Top level namespace --Usually one per application Column: Cassandra Column details --name ---byte[] ---Queried against ---Determines sort order -value ----byte[] ----Opaque to Cassandra -Timestamp ----long ----conflict resolution (last write wins) Column-oriented NoSQL Name Producer Data Model Querying BigTable Google Set of couple(key, values) Selection (by combination of row, column, and time stamp ranges) HBase Apache Groups of columns (a BigTable clone) JRUBY IRB-based shell(similar to SQL) Hypertable Hypertable Like BigTable HQL(Hypertext Query Language) CASSANDRA Apache Columns, groups of columns corresponding to a key(supercolumns) Simple selection on key, range queries, column or column ranges PNUTS Yahoo (hashed or ordered) tables, typed arrays, flexible schema Selection and projection from a single table (retrieve an arbitrary single record by primary key, range queries, complex predicates, ordering, top-k) Scalable Relational Systems • Also called NewSQL • SQL • ACID • Performance and scalability through modern innovative software architecture Scalable Relational Systems RDBMS will provide scalabilty: Use small scope operations Use small-scope transaction MySQL Cluster • shared-nothing cluster • NDB storage engine(replace the InnoDB) • Replication(2PC) • Horizontal data partitioning MySQL Cluster VoltDB VoltDB Scalable Relational Systems CONCLUSION: NoSQL pros/cons • Advantages • Massive scalability High availability Lower cost (than competitive solutions at that scale) (usually) predictable elasticity Schema flexibility, sparse & semi-structured data Disadvantages Limited query capabilities (so far) Eventual consistency is not intuitive to program for • Makes client applications more complicated No standardization • Portability might be an issue CONCLUSION • For now NoSQL databases are still far from advanced database technologies NoSQL will not replace traditional relational DBMS • NoSQL are good for specialized applications involving large unstructured distributed data with high requirements on scaling References • • Cattell, R. Scalable SQL and NoSQL data stores http://dl.acm.org/citation.cfm?id=1978919 Pokorný J.: NoSQL Databases: a step to database scalability in Web environment http://dl.acm.org/citation.cfm?id=2095583&dl=ACM&coll=DL &CFID=90098443&CFTOKEN=64346810 • http://couchdb.apache.org/ • http://project-voldemort.com/ • http://cassandra.apache.org/ • http://hbase.apache.org/