Download Amazon’s Dynamo - Northwestern University

Amazon’s Dynamo Simple Cloud Storage Foundations • 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks” – Idea of tabular data – SQL Foundations • Codd’s 12 rules – How database structured and what is available to user – Application not dependent on physical or logical levels of database – Insert, Update, Delete operators Foundations, continued • First Relational database management systems (RDBMS) – Oracle in 1979, first SQL based system – Microsoft SQL server, etc – Open source software would follow later (mySQL) • Follows Codd’s ideas – Complexity on the server side, let the query do all the work – Very stringent requirements Drawbacks • Licensing fees on a per processor rate – High end Oracle system is mind-numbingly expensive • Load Distribution requires specific nodes to handle – Some servers have specific roles, failure point in network • Complexity on servers creates difficultly with maintenance, upgrades – Upgrades often all at once as result, not incremental A New Direction • Simplify services that the database provides – Easier scaling and error handling • Take a more pragmatic approach – Tailor system to sacrifice some aspects of the traditional RDBMS to gain performance in others – Systems less general, specific end requirements in mind when creating Examples • Amazon Dynamo – Simple primary key – Highly available, end user based model – Low cost virtualized nodes • Facebook Cassandra – Similar goals to Amazon’s Dynamo – Highly avaiable, incremental scalablilty • Google File System – Master node – Data distributed across low cost nodes Dynamo Goals • Scale – adding systems to network causes minimal impact • Symmetry – No special roles, all features in all nodes • Decentralization – No Master node(s) • Highly Available – Focus on end user experience • SPEED – A system can only be as fast as the lowest level • Service Level Agreements – System can be adapted to an application’s specific needs, allows flexibility Dynamo Assumptions • Query Model – Simple interface exposed to application level – Get(), Put() – No Delete() • Atomicity, Consistency, Isolation, Durability – Operations either succeed or fail, no middle ground – System will be eventually consistent, no sacrifice of availability to assure consistency – Conflicts can occur while updates propagate through system – System can still function while entire sections of network are down • Efficiency – Measure system by the 99.9th percentile – Important with millions of users, 0.1% can be in the 10,000s • Non Hostile Environment - No need to authenticate query, speed boost Wanted Results • Deliver requests in a bounded time • Always writable – Highly available to users • No dedicated roles • Work split between nodes fairly Techniques Partitioning • Consistent Hashing – Changing the number of slots in hash table results in only a small number of keys to remap – More info • A ring of virtual nodes – Node responsible for region between it and its predecessor Virtual Node • Physical Machine has # of virtual nodes based on performance • Can adapt load more easily if a machine goes down • Likewise, assign nodes to a new machine in network Replication • Application provided parameter N • Replication on different physical nodes – Data still available if nodes go down – Makes part of preference list for query Versioning and Vector Clocks • Updates propagate asynchronously, need a way of distinguishing conflicts – Possible reason for absence of Delete() • Vector Clock – List of (node, counter) – Limited size, limit overhead for data – If all fields are less than or equal, first can be updated by second Sloppy Quorum and Hinted Handoff • W and R parameter set min # of nodes in a read or write • Read and write on the first N healthy nodes, no strict membership, can vary over time • Hint in metadata for intended node, will update once that node is again available • Allows for temporary failure in nodes or entire networks Synchronization and Gossip • Merkle Trees - Info • Use common key values between two nodes – Traverse tree and check vector clocks to see if updates are needed – Exchange information on most current version of the data if inconsistencies are found • Gossip – Nodes select neighbors at random and reconcile membership change histories • Use seed nodes to initialize • Detect failures Routing get() and put() • Two Techniques – Route request through a load balancer • Slower • Simpler application level code – Partition aware client, route directly to appropriate nodes • Faster • More complicated application level • First node routed to is “coordinator” node – Generates vector clock for put and gives data to N highest healthy nodes – Queries N highest nodes for all versions, returned all versions found Implementation • Java based – Hardware independent, JVM • Allows different back-end systems to be used, based on size of data needed to be stored – Berkeley Database Transactional Data Store – BDB Java Edition – MySQL, can handle large objects • Coordinator node is a state machine for read/writes for client – Coordinator for a write determined by fastest read Flexibility • Changing W, R, N – Business logic specific reconciliation • Data replicated over nodes • Application level reconciliation fro conflicting objects – Timestamp Reconciliation • Similar to above, last write wins – High performance read engine • By setting R = 1, W = N • Reads fast and numerous, few updates Observed Results - Speed Observed Results – Load Balancing • Higher traffic causes load to be balanced more evenly – Requests of popular keys let system to balance more easily • In lower traffic, less important to balance load Observed Results - Coordination • Client coordination can provide a speed boost • Read and write latency nearly identical • Results as expected Observed Results - Versions • Measured over 24 hour period for shopping cart – 99.94% of users saw 1 version – 0.00057% saw 2 versions – 0.00047% saw 3 versions – 0.00009% saw 4 versions • Increase caused by increase in number of concurrent writers, most likely Conclusions • Dynamo allows Amazon’s customers to have a consistent experience even in face of server and network errors • Gives a scalable solution with millions of data points to be queried quickly and efficiently • Offloads complexity to the application to provide a simple, flexible, and fast server-side implementation Thanks for listening!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Amazon’s Dynamo - Northwestern University