* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Comment - Meetup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Clusterpoint wikipedia , lookup
Data analysis wikipedia , lookup
Database model wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Search with a KeyValue Store Intro to NoSQL • Key-value store • Schemaless • Distributed • Eventually Consistent Key-Value • Single unique key for each value in the database • Extremely fast look-up • Easy distribution (no such thing as joins) Schemaless • Critical for extremely large data sets • No alter table commands, each value has no pre-defined fields Distributed • Data set is designed to be shared across multiple machines • Typically makes use of commodity servers with enough RAM to keep the entire data set in memory Eventually Consistent • Replica nodes are not notified of changes before a success response is returned to the client • Makes NoSQL problematic for highly sensitive transactions (finance, etc) Database Design in NoSQL • Denormalization is your friend • Think of collections as views on a data set that A News Site Using SQL Users Comment Stories id id id user_name story_id date birthday user_id headline content content Loading a Story with SQL SELECT * FROM stories SELECT * FROM comments LEFT JOIN users ON users.id = comments.user_id LEFT JOIN comments children ON children.parent_id = comments.id WHERE story_id = x Redesigned in a NoSQL Data Store Story #dgi3ck date headline content comments Comment #la529 content username user_image_url user_id children Comment #5bg26 content username user_image_url user_id children Comment #mn34i content username user_image_url user_id Loading a Story with NoSQL Stories::get(dgi3ck) Some Design Considerations • What is the context in which we will access this data? • What data do we need to access outside the of this context? • How often does the data change? Embedded Data • NoSQL can support foreign keys • Some data is more appropriately stored “embedded” in a parent context • E.g. Comments are rarely (if ever) accessed outside of their parent Story Cached Data • Data from an object that needs to be accessed outside of the current context can be cached • Keep in mind that it may need to be updated • E.g. a user changes his username, Comments can be updated Several common NoSQL Stores • Memcached • BigTable • SimpleDB • MongoDB Why we chose MongoDB • Auto-sharding and easy setup for distribution • JavaScript API • Powerful indexing capabilities MongoDB Libraries • ORM: mongo_mapper • https://github.com/jnunemaker/mongom apper • Underlying Connection: mongo • https://github.com/mongodb/mongoruby-driver • BSON support: bson_ext • http://rubygems.org/gems/bson_ext • • Lifebooker’s Availability Search Searches across Services Filters • • • • • • Time/Date Geographical Zone Service Category Practitioner Gender Concurrent Availability (and several more) Services, Discounts and Practitioners • Services are offered by Providers • Providers have Practitioners (Employees) • Discounts are applied to Providers for a Service in a given time Modeling this Data in MongoDB Embedding with MongoMapper Indexing and Searching • Mongo offers powerful indexing capabilities • Arrays are “first-class citizens” • Complex indices allow for great performance Creating Meta-Data • With complex data structures, creating meta-data before_save will allow you to make that data easily searchable • E.g. the maximum discount on a given day for a service Creating Indices Querying • Uses DataMapper/Arel Syntax • Chains conditions, ordering and offset Filtering Complex Data Structures • MongoDB offers a JavaScript API for MapReduce • Map - transform and filter data • Reduce - combine multiple rows into a single record A Simple Use-Case Using MapReduce to Filter Filter The Results • Scheduled to go live within 2 weeks • With sharding/distribution, tests show almost no dip in response time with more than 10x the current data set • 20x faster than MySQL implementation • 100ms vs 2000ms (or more)