Download Use NoSQL data store: Voldemort Developed in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database wikipedia , lookup

Operational transformation wikipedia , lookup

Data analysis wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
A Case Study of the Use of NoSQL
Databases By Some Companies
April Song and Sarah Graupman
Apollo

Facebook is trying to address problems with latencies by switching to a NoSQL database
called Apollo.

Facebook created Apollo internally, and it is written in C++.

Raft is a consensus protocol that makes sure that all of the systems consent and agree to the
state transitions.

Facebook mostly used RocksDB for their storage.

The read() and write() methods are atomic, which means that the entire process of reading
and writing either runs or none of it occurs if part of it fails.

The fault tolerant state machines ensure that the program executes even if one of the nodes
dies.
Apache
Cassandra

Apache Cassandra is a NoSQL database created by Facebook
for searching in inboxes.

Their goals when designing Cassandra was to give it high
availability, eventual consistency, and incremental scalability.

When writing, it will write to a random cluster.

It is currently used by companies including but not limited to:
Comcast, eBay, GitHub, Hulu, Instagram, Netflix, Reddit, The
Weather Channel, and Apple.
Cassandra
(Continued)

The efficiency of reads and writes increases linearly as the
number of machines increases.

Based on experiments at University of Toronto, Cassandra has the
best scalability compared to other NoSQL databases.

The read latency for Cassandra is about constant, regardless of
how many nodes there are.
Others

Facebook uses a distributed system called Scribe to transport all of its data.

It then uses processing systems called Puma, Swift, and Stylus which allow for
computation and analysis of the data in Java, Python, and C++, respectively.

Facebook also uses data stores such as Laser, Scuba, and Hive which work on top
of Facebook’s RocksDB database.

The many different tools that Facebook use allow them to adapt to all of the
different needs of their large company.

There is complication in this strategy though because there is significant overhead
in maintaining all of these systems and ensuring they are compatible with each
other.
DynamoDB

Amazon is focused on reliability of their data because a slight
outage can have large financial and customer relationship
consequences.

To do this, they manage their data through multiple instances of
Dynamo in multiple data centers around the world.

Dynamo is designed so that the data store is always writeable
ensuring that customers will always be allowed to add and
remove items from their shopping carts even during network
and/or server failures.
Dynamo
(Continued)

Document and key-value models are supported by Amazon
DynamoDB.

It is a cloud database, making it good for web, gaming, and IoT.

It reduces latency by having Amazon Dynamo Accelerator
(DAX), which is a cache.

Caches reduce the time it takes to retrieve data if the requested
data is in the cache.
http://ieeexplore.ieee.org/document/6228206/?reload=true

Use NoSQL data store: Voldemort

Developed in 2008

Key-Value Stores

Document store: Espresso

Developed in 2011

Document Stores


Databus

Strong timeline consistency

User-space processing

Support for long look-back queries

Low latency
DATA
Relay, Bootstrap Server, and Client Library
http://ieeexplore.ieee.org/document/6228206/?reload=true
https://blog.twitter.com/2017/the-infrastructure-behindtwitter-scale
OVERALL TRENDS
 Focus
on Low Latency
 Maintain
 Large
both MySQL an NoSQL Databases
enterprises are developing their own database systems
and releasing them to the public
REFERENCES

Auradkar, A., Botev, C., & Das, S. (2012). Data Infrastructure at LinkedIn. IEEE 28th International Conference on
Data Engineering (ICDE).

Hashemi , M. (n.d.). The Infrastructure Behind Twitter: Scale | Twitter Blogs. Retrieved May 02, 2017, from
https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale

Introducing FlockDB | Twitter Blogs. (2010, May 03). Retrieved May 02, 2017, from
https://blog.twitter.com/2010/introducing-flockdb

https://www.infoq.com/news/2014/06/facebook-apollo

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex
Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. "Dynamo: Amazon's Highly
Available Key-Value Store". ACM 2007: Print.

https://aws.amazon.com/dynamodb

http://perspectives.mvdirona.com/2008/07/facebook-releases-cassandra-as-open-source

http://cassandra.apache.org

Tilman Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen, Sergio Gomez-Villamor, Victor Muntes-Mulero, and
Serge Mankovskii. "Solving Big Data Challenges For Enterprise Application Performance Management". VLDB
Endowment 2012: Print.