Download NoSQL - student.bus.olemiss.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft SQL Server wikipedia , lookup

Oracle Database wikipedia , lookup

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
William Horner, Chris Little, Brandon Bowen













What is NoSQL
History of NoSQL
Differences from SQL
Handling the Lack of Joins
ACID
CAP Theorem/Brewer’s Theorem
Types of NoSQL
Benchmarks
Why NoSQL
Sharding
When to Use
Application Uses
Amazon Company Example
not only SQL
 not using formal structure
 Is joinless
 Database without the limitation of SQL rules
 Uses CAP Theorem/Brewer’s Theorem to improve
consistency





Started with rational databases
Growing need to handle larger amounts of data with realtime
performance
First termed as NoRel, ‘No Relational’, by Carlo Strozzi in
1998
Reintroduced in 2009 as NoSql, ‘Not Only SQL’, by Eric Evans
to discuss open-source non-rational database systems
Companies that use noSQL
•
•
•
•
•
•
•
Cisco
Dell
Best Buy
Call of Duty
Fed Ex
NASA
Netflix
●
●
●
●
●
Scalability and high availability without compromising performance
Uses column indexes
Denormalization
Materialized Views
Built-in caching
● Used in over 1500 companies with large, active data sets
● Largest cluster has 300 TB of data on over 400 machines
● Replication across multiple data centers allows failed nodes to be
replaced with no downtime
● Every node is identical, allowing no single point of failure
● Users can choose between synchronous and asynchronous
replication
Benefits
•
•
•
Schemas are dynamic
Scaling is easier and more cost efficient
Data Manipulation conforms to the language a program
uses



Multiple Queries
Caching
Nesting data
•
•
•
•
Atomicity- transactions must be “all or nothing” if a failure occurs
then nothing happens
Consistency- Data must follow any rules including constraints,
triggers, and cascades
Isolation- determines how transaction integrity is visible to other
users and systems(how you allocate the rights to data)
Durability- a transaction that has been committed will remain so



Consistency
Availability
Partition Tolerance
● Maturity - In comparison RDBMS systems have been around
for a long time. Most NoSQL alternatives are in preproduction versions with many key features yet to be
implemented.
● Support - Most NoSQL systems are Open Source projects,
and the companies that offer support are small start-ups
without global reach, support services, or the credibility of
Oracle, Microsoft, or IBM.
● Analytics and Business Intelligence - NoSQL databases have
evolved to meet the scaling demands of Web 2.0
applications.
● Administration - The design goals for NoSQL is to provide a
zero-admin solution, but as of today it requires a lot of skill
to install and a lot of to effort to maintain.
● Expertise - Almost all NoSQL developers is learning how to
use and develop for NoSQL
Document DataBase
Graph Stores
Key-value stores
Wide-column stores










MongoDB
CouchDB
RethinkDB
SequoiaDB
RavenDB
NeDB
AmisaDB
JasDB
RaptorDB
djonDB









Neo4j
Infinite Graph
Sparksee
TITAN
InfoGrid
HyperGraphDB
GraphBase
Trinity
AllegroGraph









DynamoDB
Azure Table Storage
Riak
Redis
Aerospike
LevelDB
BerkeleyDB
Oracle NoSQL Database
GenieDB










Hbase
MapR/Hortonworks/Cloudera
Cassandra
Hypertable
Accumulo
Amazon SimpleDB
Cloudata
MonetDB
HPCC
Apache Flink




Performance
Scalability
high availability
auto-scaled
Data Model
Key–Value
Store
Performance
Scalability
Flexibility
Complexity
Functionality
high
high
high
none
variable (none)
Columnhigh
Oriented Store
high
moderate
low
minimal
Documenthigh
Oriented Store
variable (high)
high
low
variable (low)
Graph
Database
variable
variable
high
high
graph theory
Relational
Database
variable
variable
low
moderate
relational
algebra
● Denormalization - optimizing read performance by adding
redundant data or grouping data in order to improve scalability and
performance
● does NOT mean that the data has not been normalized
● Denormalization should ideally take place after 3NF has been
achieved
● Constraints are used to ensure that redundant copies of data
are synchronized
● Materialized View - a database object that contains the results of a
query.
● query result is cached but can be updated from the original
query as necessary
● Keyspace - object that holds together all column families of a
design
● outermost grouping of data in datastore
● resembles a schema in RDMS
● Column Families - tuple (pair) consisting of a key-value pair, where
the key is set to a value that is a set of columns
● object that contains columns of related data
● resembles a table in RDMS
● Super Column Family - tuple (pair) that consists of key-value pair,
where the key is mapped to a value that are column families
● similar to a view in RDBS
● Column (data store) - tuple (triplet) key-value pair consisting of a
unique name, a value, and a timestamp.
the timestamp determines old data from new data
not to be confused with a standard relational database column
lowest level object in a keyspace
● Database Shard - a horizon partition in a database or a search
partition. Each partition is a separate shard.
● shards can be distributed to separate hardware, reducing the
number of rows in each table
● not to be confused with horizontal partitioning, which refers to
splitting one or more tables by rows within a single schema or
database server
● Sharding - the process of forming shards within the distributed
database system.
● traditionally done by hand coding
● auto-sharding code is highly sought after










Session Store
User Profile Store
Content and Metadata Store
Mobile Applications
Third-Party Data Aggregation
High Availability Cache
Globally Distributed Data Repository
E-Commerce
Social Gaming
Ad Targeting




Cloud based NoSQL service
Supports Document based and Key Value data models
Stores 3 geographically distributed replicas of each table to
enable high availability and data durability
May not be fully real time, supports Eventually Consistent
Reads by default, but can also support Strongly Consistent
Reads




Tables: these differ from relational databases by using data
objects
Item: has a main key value and can also have many attributes
Attributes: have a name and one or more values
Data: objects have a size limit of 400kb



Allows a secondary index that can be searched for instance a
zip code
Does not have to be unique
Allows for rapid searches for groups based on the index



Due to the growing need of large amounts of data and realtime performance, we see the increasing need for NoSQL.
With many variances, we find the costs and benefits from each
of the different styles of NoSQL
We see the compliment NoSQL gives when complimented with
cloud computing.
Questions
?
Tough!