Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
William Horner, Chris Little, Brandon Bowen What is NoSQL History of NoSQL Differences from SQL Handling the Lack of Joins ACID CAP Theorem/Brewer’s Theorem Types of NoSQL Benchmarks Why NoSQL Sharding When to Use Application Uses Amazon Company Example not only SQL not using formal structure Is joinless Database without the limitation of SQL rules Uses CAP Theorem/Brewer’s Theorem to improve consistency Started with rational databases Growing need to handle larger amounts of data with realtime performance First termed as NoRel, ‘No Relational’, by Carlo Strozzi in 1998 Reintroduced in 2009 as NoSql, ‘Not Only SQL’, by Eric Evans to discuss open-source non-rational database systems Companies that use noSQL • • • • • • • Cisco Dell Best Buy Call of Duty Fed Ex NASA Netflix ● ● ● ● ● Scalability and high availability without compromising performance Uses column indexes Denormalization Materialized Views Built-in caching ● Used in over 1500 companies with large, active data sets ● Largest cluster has 300 TB of data on over 400 machines ● Replication across multiple data centers allows failed nodes to be replaced with no downtime ● Every node is identical, allowing no single point of failure ● Users can choose between synchronous and asynchronous replication Benefits • • • Schemas are dynamic Scaling is easier and more cost efficient Data Manipulation conforms to the language a program uses Multiple Queries Caching Nesting data • • • • Atomicity- transactions must be “all or nothing” if a failure occurs then nothing happens Consistency- Data must follow any rules including constraints, triggers, and cascades Isolation- determines how transaction integrity is visible to other users and systems(how you allocate the rights to data) Durability- a transaction that has been committed will remain so Consistency Availability Partition Tolerance ● Maturity - In comparison RDBMS systems have been around for a long time. Most NoSQL alternatives are in preproduction versions with many key features yet to be implemented. ● Support - Most NoSQL systems are Open Source projects, and the companies that offer support are small start-ups without global reach, support services, or the credibility of Oracle, Microsoft, or IBM. ● Analytics and Business Intelligence - NoSQL databases have evolved to meet the scaling demands of Web 2.0 applications. ● Administration - The design goals for NoSQL is to provide a zero-admin solution, but as of today it requires a lot of skill to install and a lot of to effort to maintain. ● Expertise - Almost all NoSQL developers is learning how to use and develop for NoSQL Document DataBase Graph Stores Key-value stores Wide-column stores MongoDB CouchDB RethinkDB SequoiaDB RavenDB NeDB AmisaDB JasDB RaptorDB djonDB Neo4j Infinite Graph Sparksee TITAN InfoGrid HyperGraphDB GraphBase Trinity AllegroGraph DynamoDB Azure Table Storage Riak Redis Aerospike LevelDB BerkeleyDB Oracle NoSQL Database GenieDB Hbase MapR/Hortonworks/Cloudera Cassandra Hypertable Accumulo Amazon SimpleDB Cloudata MonetDB HPCC Apache Flink Performance Scalability high availability auto-scaled Data Model Key–Value Store Performance Scalability Flexibility Complexity Functionality high high high none variable (none) Columnhigh Oriented Store high moderate low minimal Documenthigh Oriented Store variable (high) high low variable (low) Graph Database variable variable high high graph theory Relational Database variable variable low moderate relational algebra ● Denormalization - optimizing read performance by adding redundant data or grouping data in order to improve scalability and performance ● does NOT mean that the data has not been normalized ● Denormalization should ideally take place after 3NF has been achieved ● Constraints are used to ensure that redundant copies of data are synchronized ● Materialized View - a database object that contains the results of a query. ● query result is cached but can be updated from the original query as necessary ● Keyspace - object that holds together all column families of a design ● outermost grouping of data in datastore ● resembles a schema in RDMS ● Column Families - tuple (pair) consisting of a key-value pair, where the key is set to a value that is a set of columns ● object that contains columns of related data ● resembles a table in RDMS ● Super Column Family - tuple (pair) that consists of key-value pair, where the key is mapped to a value that are column families ● similar to a view in RDBS ● Column (data store) - tuple (triplet) key-value pair consisting of a unique name, a value, and a timestamp. the timestamp determines old data from new data not to be confused with a standard relational database column lowest level object in a keyspace ● Database Shard - a horizon partition in a database or a search partition. Each partition is a separate shard. ● shards can be distributed to separate hardware, reducing the number of rows in each table ● not to be confused with horizontal partitioning, which refers to splitting one or more tables by rows within a single schema or database server ● Sharding - the process of forming shards within the distributed database system. ● traditionally done by hand coding ● auto-sharding code is highly sought after Session Store User Profile Store Content and Metadata Store Mobile Applications Third-Party Data Aggregation High Availability Cache Globally Distributed Data Repository E-Commerce Social Gaming Ad Targeting Cloud based NoSQL service Supports Document based and Key Value data models Stores 3 geographically distributed replicas of each table to enable high availability and data durability May not be fully real time, supports Eventually Consistent Reads by default, but can also support Strongly Consistent Reads Tables: these differ from relational databases by using data objects Item: has a main key value and can also have many attributes Attributes: have a name and one or more values Data: objects have a size limit of 400kb Allows a secondary index that can be searched for instance a zip code Does not have to be unique Allows for rapid searches for groups based on the index Due to the growing need of large amounts of data and realtime performance, we see the increasing need for NoSQL. With many variances, we find the costs and benefits from each of the different styles of NoSQL We see the compliment NoSQL gives when complimented with cloud computing. Questions ? Tough!