Download NoSQL CA485 Ray Walshe 2015

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Big data wikipedia , lookup

Operational transformation wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Clusterpoint wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
NoSQL
CA485 Ray Walshe 2015
BASE vs ACID
Summary
• Traditional relational database management
systems (RDBMS) do not scale because they
adhere to ACID. A strong movement within
cloud computing is to utilize non-traditional
data stores (sometimes poorly dubbed
NoSQL or NewSQL) for managing large
amounts of data. This article contrasts the
traditional ACID with the new-style BASE
approach
CA485 Ray Walshe 2015
2
Scaling
"If your application relies upon persistence, then data storage will probably
become your bottleneck."
• Many websites are I/O-bound. That is they are limited by how
quickly they can access data from their data storage system (normally
a SQL database).
• To scale or improve performance, you have two options:
– Vertical Scaling: Get a stronger, faster, better machine.
• Easiest, but also expensive
• Limited to the largest single system available
– Horizontal Scaling: Spread data across multiple machines.
• More flexible, but also more complex
• Functional Scaling: group data by function and spread functional
groups across databases.
• Sharding: splitting data with functional areas across multiple databases
CA485 Ray Walshe 2015
CAP
A theorem which conjectures that web services
cannot ensure all three of the following
properties at once:
• Consistency: All operations appear to occur
at once.
• Availability: Every operation must terminate
in an intended response.
• Partition Tolerance: Operations will
complete, even if individual components are
unavailable.
CA485 Ray Walshe 2015
ACID
Traditional databases utilize transactions that adhere to the
following guarantees:
• Atomicity: All operations in the transaction will
complete or none will.
• Consistency: Database will be in a consistent state
before and after a transaction.
• Isolation: Transaction will behave as if it is the only
operation being performed.
• Durability: Upon completion of the transaction, the
operation will not be reversed.
CA485 Ray Walshe 2015
To ensure these properities when using partitioned
databases, traditional RDBMS utilizet two-phase
commit:
1. First the transaction coordinator asks each database
involved to precommit and indicate if the commit is
possible.
2. If all agree, then coordinator instructs each database to
commit.
This method ensures consistency over availability (if any
databases are down, then we can't commit). Likewise,
this locking and coordination serves as a bottleneck
and prevents from scaling to large numbers of nodes.
CA485 Ray Walshe 2015
BASE
The current trend in cloud computing data storage is to
loosen or relax the requirements of consistency in favor
of more availablity. This is embodied in the BASE
approach:
• Basically available: system guarantees the availability
of your data; but the response can be "failure" if the
data is in the middle of changing.
• Soft State: the state of the system is constantly
changing.
• Eventually Consistent: the system will eventually
become consistent once it stops receiving input.
CA485 Ray Walshe 2015
BASE is optimistic and accepts that the
database consistency will be in a state of flux.
It achieves availibility by supporting partial
failures without total system failure (i.e.
partition tolerance).
To implement BASE, many systems rely on
some sort of message queue to persistently
store and route data to various storage
services the perform the actual database
operations.
CA485 Ray Walshe 2015
BigTable
• BigTable is a distributed storage system
created by Google for managing structured
data. It is structured as a large table that may
be petabytes in size and distributed across
tens of thousands of machines.
• HBase is an open source version of BigTable that
works on top of Hadoop.
CA485 Ray Walshe 2015
BigTable is a large, persistant, distributed,
sparse, sorted, and multidimensional
map.
Map
• A map is an associative array or data structure that
allows one to look up a value to a corresponding key
quickly (e.g. hash table, binary search tree, etc.); in other
words, it's a collection of key, value pairs.
In BigTable, the key consists of the following:
row key: string, column key: string, timestamp: int64
while the value is simply an array of bytes that is
interpreted by the application (up to 64KB).
CA485 Ray Walshe 2015
Sorted
Normally, associative arrays are not sorted (keys are hashed to
a position in the map). In BigTable, however, data is sorted
by row to keep related data close together. This means that
we must be careful in choosing row names such that related
data is sorted near each other.
For example, to store data about websites, Google's WebTable
reverses the domain names of web pages:
ie.dcu.computing
ie.dcu.eeng
ie.dcu.meng
This keeps DCU website rows close together
CA485 Ray Walshe 2015
• Data Locality
Sorting the rows is mechanism for improving
data locality. With pure hashing it is possible
for related data to be spread across multiple
machines. Sorting and then partitioning the
data allows all the data for one key subset to
reside on one machine. A similar technique is
used to shuffle data to reducers in
MapReduce.
CA485 Ray Walshe 2015
Multidimensional
• Each table is indexed by rows. Each row contains one or more named
column families which are defined when the table is first created.
Within a column family, there can be one or more named columns
which can be created on the fly. With rows, column families, and
columns, we have three-level naming hierarchy to identify data.
For example:
ie.dcu.computing:
- users:
- ray: Ray Walshe
- cdaly: Charlie
- system:
- : Linux 3.2
CA485 Ray Walshe 2015
#
#
#
#
#
#
Row
Column Family
Column
Column
Column Family
Column (Null name)
• To get data, we first access the row via the row name and then specify
column key which is in the form column-family:column. In the
example above, we first get the row ie.dcu.computing and then get
a particular user with users:ray. To get multiple users, we can use a
regular expression (or glob) to fetch multiple values: users:*.
• In addition to row and column, the data is also versioned by
timestamps (either real time or application defined time) and sorted
such that the most recent cell is first. To help manage these multiple
versions, BigTable provides a mechanism to remove entries either by
date (keep versions since some time t) or by amount (keep only the
latest n versions). These garbage collection settings can be specified
per column-family.
CA485 Ray Walshe 2015
• In addition to row and column, the data is
also versioned by timestamps (either real time
or application defined time) and sorted such
that the most recent cell is first. To help
manage these multiple versions, BigTable
provides a mechanism to remove entries
either by date (keep versions since some time
t) or by amount (keep only the latest n
versions). These garbage collection settings
can be specified per column-family.
CA485 Ray Walshe 2015
Sparse
While the number of column-families is fixed at creation, the number of
columns can grow arbitrarily. This means that within a particular row, it is
possible for many columns to be empty.
ie.dcu.computing:
- language:
- : EN
- contents:
- : <html>...
- anchor:
- dcu.ie:
- microsoft.com:
Ie.dcu.computing.ftp:
- language:
- : EN
- contents:
- : <html>...
- anchor:
- dcu.ie:
- kernel.org:
- computing.dcu.ie:
- reddit.com:
- freenode.net:
CA485 Ray Walshe 2015
Dublin City University
Microsoft
Dublin City University
Linux
Vinson
Reddit
Freenode
• Distributed
• BigTable's data is spread across many independent machines. Tables
are broken up into collections of rows called tablets such that each
tablet has a set of consecutive rows. This allows for distribution of a
Table onto multiple machines and for load balancing (split large
Tablets into smaller ones).
• Persistant
• BigTable uses GFS to store data and log files persistantly.
• Large
• Can handle upwards of a Petabyte of data. Hooks into MapReduce
(can be used as either input or output) and is utilized by a variety of
applications.
CA485 Ray Walshe 2015
Implementation
Architecturally, BigTable resembles GFS: a master that coordinates
activity and a large number of tablet servers that store and manage the
data. These tablet servers can be added or removed dynamically.
Master
Master assigns tablets to tablet servers and balances tablet server load. It also
manages garbage collection of files in GFS and handles scheme
changes.
Tablet Server
A tablet server manages a set of tablets (10-1,000 per server) and handles
read/write requests to the tablets. Internally, this data is stored in
Google' SSTable format, which is a persistent, ordered, immutable
key, value map file.
CA485 Ray Walshe 2015
Chubby
To coordinate the various servers, Chubby, a highly available and
persistent distributed lock service is used to manage leases for
resources and configuration storage by providing a namespace of files
and directories that the user can lock atomically. It is used to:
• Ensure there is only one active master.
• Discover tablet servers.
• Store BigTable schema information.
• Store access control lists.
Example of how it is used:
When a tablet server starts, it creates and acquires an exclusive lock on a
uniquely-named file in the servers directory. The master can monitor
this directory for new servers.
CA485 Ray Walshe 2015
• Replication
• A BigTable can be configured for replication
to multiple BigTable clusters in different data
centers to ensure availability. Data is
propagated asynchronously, which results in
an eventually consistent model.
CA485 Ray Walshe 2015
Applications
BigTable, like GFS and MapReduce, is utilized internally
by Google for many of their operations.
Google Analytics
This is a service that helps webmasters analyze traffic
patterns at their website. BigTable is used to maintain
raw click information (200 TB).
Google Earth
BigTable is used to store the raw image data.
Personalized Search
User data for personalized search is stored in BigTable.
CA485 Ray Walshe 2015
How is it NoSQL?
• A BigTable cluster may contain several large tables, but it does not
support operations across multiple tables (non-relational, no joining).
• No SQL! Perform key lookups to access data.
• Columns have no type (just a bunch of bytes) and may be quite large.
• Columns can be added dynamically.
• Columns within a row may be quite sparse; that is we may have a
large number of columns, but each row may only have a tiny fraction
of them populated.
• Availability is increased by asynchronously propogating data to
multiple clusters in different data centers.
CA485 Ray Walshe 2015