Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Concurrency control wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Operational transformation wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Business intelligence wikipedia , lookup
A Case Study of the Use of NoSQL Databases By Some Companies April Song and Sarah Graupman Apollo Facebook is trying to address problems with latencies by switching to a NoSQL database called Apollo. Facebook created Apollo internally, and it is written in C++. Raft is a consensus protocol that makes sure that all of the systems consent and agree to the state transitions. Facebook mostly used RocksDB for their storage. The read() and write() methods are atomic, which means that the entire process of reading and writing either runs or none of it occurs if part of it fails. The fault tolerant state machines ensure that the program executes even if one of the nodes dies. Apache Cassandra Apache Cassandra is a NoSQL database created by Facebook for searching in inboxes. Their goals when designing Cassandra was to give it high availability, eventual consistency, and incremental scalability. When writing, it will write to a random cluster. It is currently used by companies including but not limited to: Comcast, eBay, GitHub, Hulu, Instagram, Netflix, Reddit, The Weather Channel, and Apple. Cassandra (Continued) The efficiency of reads and writes increases linearly as the number of machines increases. Based on experiments at University of Toronto, Cassandra has the best scalability compared to other NoSQL databases. The read latency for Cassandra is about constant, regardless of how many nodes there are. Others Facebook uses a distributed system called Scribe to transport all of its data. It then uses processing systems called Puma, Swift, and Stylus which allow for computation and analysis of the data in Java, Python, and C++, respectively. Facebook also uses data stores such as Laser, Scuba, and Hive which work on top of Facebook’s RocksDB database. The many different tools that Facebook use allow them to adapt to all of the different needs of their large company. There is complication in this strategy though because there is significant overhead in maintaining all of these systems and ensuring they are compatible with each other. DynamoDB Amazon is focused on reliability of their data because a slight outage can have large financial and customer relationship consequences. To do this, they manage their data through multiple instances of Dynamo in multiple data centers around the world. Dynamo is designed so that the data store is always writeable ensuring that customers will always be allowed to add and remove items from their shopping carts even during network and/or server failures. Dynamo (Continued) Document and key-value models are supported by Amazon DynamoDB. It is a cloud database, making it good for web, gaming, and IoT. It reduces latency by having Amazon Dynamo Accelerator (DAX), which is a cache. Caches reduce the time it takes to retrieve data if the requested data is in the cache. http://ieeexplore.ieee.org/document/6228206/?reload=true Use NoSQL data store: Voldemort Developed in 2008 Key-Value Stores Document store: Espresso Developed in 2011 Document Stores Databus Strong timeline consistency User-space processing Support for long look-back queries Low latency DATA Relay, Bootstrap Server, and Client Library http://ieeexplore.ieee.org/document/6228206/?reload=true https://blog.twitter.com/2017/the-infrastructure-behindtwitter-scale OVERALL TRENDS Focus on Low Latency Maintain Large both MySQL an NoSQL Databases enterprises are developing their own database systems and releasing them to the public REFERENCES Auradkar, A., Botev, C., & Das, S. (2012). Data Infrastructure at LinkedIn. IEEE 28th International Conference on Data Engineering (ICDE). Hashemi , M. (n.d.). The Infrastructure Behind Twitter: Scale | Twitter Blogs. Retrieved May 02, 2017, from https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale Introducing FlockDB | Twitter Blogs. (2010, May 03). Retrieved May 02, 2017, from https://blog.twitter.com/2010/introducing-flockdb https://www.infoq.com/news/2014/06/facebook-apollo Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. "Dynamo: Amazon's Highly Available Key-Value Store". ACM 2007: Print. https://aws.amazon.com/dynamodb http://perspectives.mvdirona.com/2008/07/facebook-releases-cassandra-as-open-source http://cassandra.apache.org Tilman Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen, Sergio Gomez-Villamor, Victor Muntes-Mulero, and Serge Mankovskii. "Solving Big Data Challenges For Enterprise Application Performance Management". VLDB Endowment 2012: Print.