Download Black Friday and Cyber Monday: Best Practices for Your E

Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database Tim Vaillancourt Sr. Technical Operations Architect @ Percona Agenda ● Synchronous versus Asynchronous Applications ● Scaling a Synchronous/Latency-sensitive Application ● Scaling an Asynchronous Application ● Efficient Usage of Data at Scale ○ Secondary/Slave Hosts ○ Caching ○ Queuing ● Efficient Usage of Data at Scale ○ Moving Expensive Work ○ Caching Techniques ○ Counters and In-memory Stores ○ Connection Pooling Agenda ● Scaling Out (Horizontal) Tricks ○ Pre-Sharding ○ Kill Switches ○ Limits and Graphs ● Scaling with Hardware (Vertical Scaling) ● Testing Performance and Capacity ● Knowing Your Application and Questions to Ask at Development Time ● Questions About Me ● Started at Percona in January 2016 ● Experience ○ Web Publishing ■ Big-scale LAMP-based Websites ○ Ecommerce ■ Large Inventory SaaS ○ Gaming ■ DevOps ● 50-100 Microservices ● 5-7+ x Massive Launches / Year ● Design, launch and maintain apps About Me ■ DBA at EA DICE ● 2 x New Titles ● 5+ x Legacy Titles ○ Technologies ■ MySQL ■ MongoDB ■ Cassandra ■ Redis and Memcached ■ RabbitMQ, Kafka and ActiveMQ ■ Solr and Elasticsearch ■ (Sort of) AWS, HDFS, HBase, Postgres, etc… Services Monolith ● One application that does everything ● Example: Chrome, MySQL, huge Python app Microservice ● Different purposes, pain points, SLA apps are discreet services ● Often easier to scale/troubleshoot ● Reduces risk of outage ● Example: frontend PHP app, messaging app, encoding app, etc In Practice ● Both can be scaled up and down with the right features ● Microservices offer more flexibility ● Monolith services bring problems at scale Application Operations Synchronous ● Blocking operation until success or failure ● Slower requests ● Example: a file uploading app Asynchronous ● Request and response are separated ● Fast response time back to user/application ● Example: a social media site Slow Operations ● Can cause pileups in a tiered system Applications Synchronous ● Pros: less code, always the right answer ● Cons: blocking operations and poorer efficiency ● Example: a file uploading app Latency/Integrity Sensitive ● Pros: always the right answer ● Cons: less scalability tricks available ● Example: a stock trading app that cannot accept “slave lag” Asynchronous ● Pros: light operations and more scalability ● Cons: eventual consistency (and sometimes more code) ● Example: a social media site Types of Data Designs Decentralised ● Data is duplicated in several places ● Pros: lighter to read, decreased locking, easy to shard ● Cons: increased storage space, extra duplication effort Centralised ● Data is kept in one (or few) places and referenced ● Pros: less storage, one source-of-truth ● Cons: locking, inefficiencies, sharding issues Balancing Request Impact Read-focused Apps ● Benefit from ○ Values pre-computed at write/change-time ○ Indices and/or few “scans” for data ○ No/few JOINs/operations to get result Write-focused Apps ● Benefit from ○ No pre-computing of values (compute at read-time) ○ No/few indices to update ○ Insert/Append > Update ○ Reads: compute read summaries with replicas, add indices to secondaries only, etc Queuing Updates Event Metadata ● Example: “UserX has the new top score!” ● Without Queue example ○ Update Top Score in Database(s) ○ Send Email to Friends ○ Post to Facebook Page ○ Update cache ○ ... ● With Queue example ○ Add event to queue ‘topscore’ ○ Apps read queue Queuing Updates Update Buffering ● Scenario: there is a high rate of updates to buffer ● Queue-based example ○ App adds to update buffer (queue) ○ Worker app works from the bottom of buffer ● Queue Operational Benefits ○ Spikes in traffic ○ Backend downtime ○ Communication bus Scaling Sync./Latency-Sensitive Apps ● ● ● ● ● Rethink the Flow Using Async Use lots of database RAM Shard the database Reduce impact of request flow Apache Cassandra ○ Synchronous ○ Very write optimized ● Percona XtraDB Cluster, NDB ● Use memory-based storage ○ Queue persistence to database Efficient Usage of Data at Scale Expensive DB Work ● Focus on lightweight user-facing operations ● Move aggregations/summaries/reporting to background ● Use replicas for expensive jobs ● Avoid or reduce (maybe cache) “JOINs” ● Enable and monitor metrics ○ MySQL ■ log_queries_not_using_indexes ○ MongoDB ■ Enable operationProfiling ○ Review metrics and improve! ○ Percona Monitoring and Management Efficient Usage of Data at Scale Caching / In-Memory Stores ● Alleviates load from database ● Very fast lookups ● Low connection overhead ○ MySQL connection buffers: ~1MB+ ○ MongoDB connection buffers: ~1MB ○ Redis or Memcache connection buffers: 0-limit/infinity** ● Server-Side ○ Hit/Miss Caching ■ If something is not in the cache: find + add it. TTL expiry ○ Inline/Preemptive Caching ■ Update/Delete cache data at change time/preemptively Efficient Usage of Data at Scale Caching / In-Memory Stores (continued) ○ Client-Side ■ Cache client data in the client app/browser/etc ○ In-memory Stores ■ Memcached ■ Redis ■ Percona Server for MongoDB with Memory Engine :) ○ Use TTLs to trim data Efficient Usage of Data at Scale Storing Numerical Counters and Stats ● Offload to in-memory stores ○ Incremented/decremented counters ○ Aggregations, summaries, counts ● Count-style Queries to Counters ○ Increment counter at request/change time ○ Read counter value at read-request time ○ Or, try to use an index Efficient Usage of Data at Scale Connection Pooling ● Removes 3-way TCP “handshake” from request (more w/SSL) ● Reduces threading overhead on databases ● Proxies on App server localhost/loopback ○ Reduces 1 x TCP ‘hop’, ie: faster connect time ○ Can create a LOT of DB connections with many app servers Efficient Usage of Data at Scale Connection Pooling (continued) ● MySQL Proxies ○ ProxySQL ○ HAProxy ○ Maxscale ○ Others… ● MongoDB Proxies ○ Mongos (sharding) process ● Proxy-on-Localhost or direct is fastest Virtualization, Containers, etc Virtualization ● Pretends to be a real computer from BIOS up ● OS + Software run under a hypervisor layer ● Pros ○ Full hardware-level emulation, eg: CentOS, Redhat, Win 10 ○ Automation of platform (sometimes) ● Cons ○ Emulation overhead ○ Slow boot-up time ○ Lots of OSs to update Virtualization, Containers, etc Containers (cgroups, jails) ● Several can run inside a single operating system and kernel ● Offers controls to limit resources like RAM, CPU time, etc ● Pros ○ Low overhead ○ Container creation is very fast Virtualization, Containers, etc Mesos, Kubernetes, etc ● Make a lot of servers distribute work, containers, etc ● Apache Mesos: “Distributed systems kernel” ○ Agent on every host and manager servers give out work ● Kubernetes Virtualization, Containers, etc Many Processes per Host ● Run un-related processes on hosts ● Add/remove from load balancers ● Not advised for disk-bound or high-bandwidth apps Scaling Out Tricks Sharding ● Techniques ○ Modulus ■ Even distribution of keys ■ Hard to reshape data ○ Map-based ■ 1-to-1 shard mapping using another table, config, etc ■ Easy to reshape data ● Launch with many shards in advance ○ 1-4 MySQL/MongoDB Instance/host ○ 1 MySQL/MongoDB Instance/host, 4 x databases as shards ○ 1 MySQL/MongoDB Instance/host, small hardware Scaling Out Tricks Sharding Modulus: Mapping: Scaling Out Tricks Hardware ● Have a strategy to add/remove capacity quickly ○ Cloud Instances ○ Mesos/Kubernetes ○ Automation ● Use cheap application servers for in-memory stores and apps ● Launch with lots of RAM, scale down post-launch Scaling Out Tricks Elasticity ● Ensure there is a way to add/remove hosts, examples: ○ Load Balancers ■ Good health-checks are important ○ Application Configs ■ File ■ Database ■ Zookeeper Scaling Out Tricks At Launch... ● Scale-out ○ Keep spare servers online, partially configured ○ Launch with extra database replicas (slave/secondary) ○ Monitor usage and remove extra hardware post-launch ○ Monitor and adjust capacity ● Scale-up ○ Launch with lots of RAM ● Traffic Control ○ Launch one region at a time ○ Launch with rate limits Scaling Out Tricks Application “Kill switches” ● A switch to disable certain app features/functions ● Useful when there is: ○ Too much traffic/scale-up ○ DDoS ○ A maintenance Scaling Out Tricks Limiting Graph Structures ● “Friends” / ”Followers” features are often graphs ● If Katy Perry or Barack Obama used your “friends” feature… ● Limit the size of graphs, or queue events for fan-out updating Scaling Out Tricks Batching and Parallel Work ● Do large queries in parallel ○ Modern CPUs have many cores (2, 4, 8+) ○ 1 connection = 1 thread = 1 CPU core ● Batch inserts/updates ○ 1 x update with 1000 items > 1000 x updates with 1 item Scaling Up Tricks ● Test provider turn-around time on hardware upgrading ● Test application performance on improved hardware in advance ● Scale up only resources needed Databases General ● Monitoring/reviewing slow queries reduces most inefficiencies ● More memory will reduce disk requests ● SSDs will reduce disk request time ● Proper database and kernel tunings will help further ○ Linux has very inefficient defaults! ● Try to use real local-disks, not EBS, NFS, etc Queries ● Don’t try to make MySQL/MongoDB a queue or search engine! ● Decentralizing data and pre-computing answers for reads will take you far ● The best query is no query (cache) Databases Sharding Testing Performance and Capacity General ● Try to emulate the real user traffic ● Add micro-pauses to simulate reality ● Cloud-based providers are great for running load generation Applications ● Component testing ○ Test the max volume of each component on a single host ○ Test the max volume of each component on many hosts ○ Calculate host scalability, ie: “+1 host = +80% more traffic” ● Feature capacity ○ Test the impact of each feature if not separate Testing Performance and Capacity Databases ● Replay real user traffic on real backups ● Load test tools: Linkbench, Sysbench, TPCC, JMeter, etc ● Single feature/query testing ○ Understand host capacity per feature, eg: “2000 user login queries/sec per db replica” ● Know your slowest query! Development-time Questions General ● What does the app do? ● If I break X, what happens? ● Are connections to data stores “pooled”? Replicas ● Can the app use replicas (with possible lag)? ○ Tip: start early, deploy replication from the start ● Can we Add/Remove replicas without disruption? Sharding ● Can the app understand shards/partitions? ● How is data balanced post-sharding? ● Are there cross-shard references? Development-time Questions Caching ● What data can be cached? ● Will an change be read immediately? ○ Can we pre-cache this change? ● When should the cache delete an item? ○ Can we set TTLs on our keys? ● How do we add/remove cache servers easily? Knowing Your App If you see… ● The app is write heavy ○ Remove overhead from immediate write path ○ Batch writes if possible ● The app is read heavy ○ Reduce scans/operations from the read path (index, etc) ○ Add as many replicas (slave/secondary) as needed ● The app queries for counts often, ie: # of items, friends, etc ○ Move count-queries to incremented in-memory counters ○ Or, create an index for the count query ● The app uses references or joins often ○ Consider decentralising the data (with fan-out updates) Themes ● Make all features, apps, databases elastic ● Request Flow ○ Make the heavy workload easy / make the light workload hard ○ Move graph updates to background (queues, async, etc) ○ Move ‘counts’ to counters ● Caching ○ Cheaper/faster to access than DB ○ Try to cache before anyone reads data ● Queues ○ Great for replicating events while simplifying update ○ Great for batching changes ● Monitor everything! Try Percona Monitoring and Management! Join us at Percona Live Europe When: October 3-5, 2016 Where: Amsterdam, Netherlands The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies. ● Get briefed on the hottest topics ● Learn about building and maintaining high-performing deployments ● Listen to technical experts and top industry leaders Use promo code “WebinarPLAM16” and receive €15 off the current registration price! Sponsorship opportunities available as well here. Questions? Thanks for joining! Be sure to checkout the Percona Blog for more technical blogs and topics!

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Black Friday and Cyber Monday: Best Practices for Your E