* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Black Friday and Cyber Monday: Best Practices for Your E
Survey
Document related concepts
Transcript
Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database Tim Vaillancourt Sr. Technical Operations Architect @ Percona Agenda ● Synchronous versus Asynchronous Applications ● Scaling a Synchronous/Latency-sensitive Application ● Scaling an Asynchronous Application ● Efficient Usage of Data at Scale ○ Secondary/Slave Hosts ○ Caching ○ Queuing ● Efficient Usage of Data at Scale ○ Moving Expensive Work ○ Caching Techniques ○ Counters and In-memory Stores ○ Connection Pooling Agenda ● Scaling Out (Horizontal) Tricks ○ Pre-Sharding ○ Kill Switches ○ Limits and Graphs ● Scaling with Hardware (Vertical Scaling) ● Testing Performance and Capacity ● Knowing Your Application and Questions to Ask at Development Time ● Questions About Me ● Started at Percona in January 2016 ● Experience ○ Web Publishing ■ Big-scale LAMP-based Websites ○ Ecommerce ■ Large Inventory SaaS ○ Gaming ■ DevOps ● 50-100 Microservices ● 5-7+ x Massive Launches / Year ● Design, launch and maintain apps About Me ■ DBA at EA DICE ● 2 x New Titles ● 5+ x Legacy Titles ○ Technologies ■ MySQL ■ MongoDB ■ Cassandra ■ Redis and Memcached ■ RabbitMQ, Kafka and ActiveMQ ■ Solr and Elasticsearch ■ (Sort of) AWS, HDFS, HBase, Postgres, etc… Services Monolith ● One application that does everything ● Example: Chrome, MySQL, huge Python app Microservice ● Different purposes, pain points, SLA apps are discreet services ● Often easier to scale/troubleshoot ● Reduces risk of outage ● Example: frontend PHP app, messaging app, encoding app, etc In Practice ● Both can be scaled up and down with the right features ● Microservices offer more flexibility ● Monolith services bring problems at scale Application Operations Synchronous ● Blocking operation until success or failure ● Slower requests ● Example: a file uploading app Asynchronous ● Request and response are separated ● Fast response time back to user/application ● Example: a social media site Slow Operations ● Can cause pileups in a tiered system Applications Synchronous ● Pros: less code, always the right answer ● Cons: blocking operations and poorer efficiency ● Example: a file uploading app Latency/Integrity Sensitive ● Pros: always the right answer ● Cons: less scalability tricks available ● Example: a stock trading app that cannot accept “slave lag” Asynchronous ● Pros: light operations and more scalability ● Cons: eventual consistency (and sometimes more code) ● Example: a social media site Types of Data Designs Decentralised ● Data is duplicated in several places ● Pros: lighter to read, decreased locking, easy to shard ● Cons: increased storage space, extra duplication effort Centralised ● Data is kept in one (or few) places and referenced ● Pros: less storage, one source-of-truth ● Cons: locking, inefficiencies, sharding issues Balancing Request Impact Read-focused Apps ● Benefit from ○ Values pre-computed at write/change-time ○ Indices and/or few “scans” for data ○ No/few JOINs/operations to get result Write-focused Apps ● Benefit from ○ No pre-computing of values (compute at read-time) ○ No/few indices to update ○ Insert/Append > Update ○ Reads: compute read summaries with replicas, add indices to secondaries only, etc Queuing Updates Event Metadata ● Example: “UserX has the new top score!” ● Without Queue example ○ Update Top Score in Database(s) ○ Send Email to Friends ○ Post to Facebook Page ○ Update cache ○ ... ● With Queue example ○ Add event to queue ‘topscore’ ○ Apps read queue Queuing Updates Update Buffering ● Scenario: there is a high rate of updates to buffer ● Queue-based example ○ App adds to update buffer (queue) ○ Worker app works from the bottom of buffer ● Queue Operational Benefits ○ Spikes in traffic ○ Backend downtime ○ Communication bus Scaling Sync./Latency-Sensitive Apps ● ● ● ● ● Rethink the Flow Using Async Use lots of database RAM Shard the database Reduce impact of request flow Apache Cassandra ○ Synchronous ○ Very write optimized ● Percona XtraDB Cluster, NDB ● Use memory-based storage ○ Queue persistence to database Efficient Usage of Data at Scale Expensive DB Work ● Focus on lightweight user-facing operations ● Move aggregations/summaries/reporting to background ● Use replicas for expensive jobs ● Avoid or reduce (maybe cache) “JOINs” ● Enable and monitor metrics ○ MySQL ■ log_queries_not_using_indexes ○ MongoDB ■ Enable operationProfiling ○ Review metrics and improve! ○ Percona Monitoring and Management Efficient Usage of Data at Scale Caching / In-Memory Stores ● Alleviates load from database ● Very fast lookups ● Low connection overhead ○ MySQL connection buffers: ~1MB+ ○ MongoDB connection buffers: ~1MB ○ Redis or Memcache connection buffers: 0-limit/infinity** ● Server-Side ○ Hit/Miss Caching ■ If something is not in the cache: find + add it. TTL expiry ○ Inline/Preemptive Caching ■ Update/Delete cache data at change time/preemptively Efficient Usage of Data at Scale Caching / In-Memory Stores (continued) ○ Client-Side ■ Cache client data in the client app/browser/etc ○ In-memory Stores ■ Memcached ■ Redis ■ Percona Server for MongoDB with Memory Engine :) ○ Use TTLs to trim data Efficient Usage of Data at Scale Storing Numerical Counters and Stats ● Offload to in-memory stores ○ Incremented/decremented counters ○ Aggregations, summaries, counts ● Count-style Queries to Counters ○ Increment counter at request/change time ○ Read counter value at read-request time ○ Or, try to use an index Efficient Usage of Data at Scale Connection Pooling ● Removes 3-way TCP “handshake” from request (more w/SSL) ● Reduces threading overhead on databases ● Proxies on App server localhost/loopback ○ Reduces 1 x TCP ‘hop’, ie: faster connect time ○ Can create a LOT of DB connections with many app servers Efficient Usage of Data at Scale Connection Pooling (continued) ● MySQL Proxies ○ ProxySQL ○ HAProxy ○ Maxscale ○ Others… ● MongoDB Proxies ○ Mongos (sharding) process ● Proxy-on-Localhost or direct is fastest Virtualization, Containers, etc Virtualization ● Pretends to be a real computer from BIOS up ● OS + Software run under a hypervisor layer ● Pros ○ Full hardware-level emulation, eg: CentOS, Redhat, Win 10 ○ Automation of platform (sometimes) ● Cons ○ Emulation overhead ○ Slow boot-up time ○ Lots of OSs to update Virtualization, Containers, etc Containers (cgroups, jails) ● Several can run inside a single operating system and kernel ● Offers controls to limit resources like RAM, CPU time, etc ● Pros ○ Low overhead ○ Container creation is very fast Virtualization, Containers, etc Mesos, Kubernetes, etc ● Make a lot of servers distribute work, containers, etc ● Apache Mesos: “Distributed systems kernel” ○ Agent on every host and manager servers give out work ● Kubernetes Virtualization, Containers, etc Many Processes per Host ● Run un-related processes on hosts ● Add/remove from load balancers ● Not advised for disk-bound or high-bandwidth apps Scaling Out Tricks Sharding ● Techniques ○ Modulus ■ Even distribution of keys ■ Hard to reshape data ○ Map-based ■ 1-to-1 shard mapping using another table, config, etc ■ Easy to reshape data ● Launch with many shards in advance ○ 1-4 MySQL/MongoDB Instance/host ○ 1 MySQL/MongoDB Instance/host, 4 x databases as shards ○ 1 MySQL/MongoDB Instance/host, small hardware Scaling Out Tricks Sharding Modulus: Mapping: Scaling Out Tricks Hardware ● Have a strategy to add/remove capacity quickly ○ Cloud Instances ○ Mesos/Kubernetes ○ Automation ● Use cheap application servers for in-memory stores and apps ● Launch with lots of RAM, scale down post-launch Scaling Out Tricks Elasticity ● Ensure there is a way to add/remove hosts, examples: ○ Load Balancers ■ Good health-checks are important ○ Application Configs ■ File ■ Database ■ Zookeeper Scaling Out Tricks At Launch... ● Scale-out ○ Keep spare servers online, partially configured ○ Launch with extra database replicas (slave/secondary) ○ Monitor usage and remove extra hardware post-launch ○ Monitor and adjust capacity ● Scale-up ○ Launch with lots of RAM ● Traffic Control ○ Launch one region at a time ○ Launch with rate limits Scaling Out Tricks Application “Kill switches” ● A switch to disable certain app features/functions ● Useful when there is: ○ Too much traffic/scale-up ○ DDoS ○ A maintenance Scaling Out Tricks Limiting Graph Structures ● “Friends” / ”Followers” features are often graphs ● If Katy Perry or Barack Obama used your “friends” feature… ● Limit the size of graphs, or queue events for fan-out updating Scaling Out Tricks Batching and Parallel Work ● Do large queries in parallel ○ Modern CPUs have many cores (2, 4, 8+) ○ 1 connection = 1 thread = 1 CPU core ● Batch inserts/updates ○ 1 x update with 1000 items > 1000 x updates with 1 item Scaling Up Tricks ● Test provider turn-around time on hardware upgrading ● Test application performance on improved hardware in advance ● Scale up only resources needed Databases General ● Monitoring/reviewing slow queries reduces most inefficiencies ● More memory will reduce disk requests ● SSDs will reduce disk request time ● Proper database and kernel tunings will help further ○ Linux has very inefficient defaults! ● Try to use real local-disks, not EBS, NFS, etc Queries ● Don’t try to make MySQL/MongoDB a queue or search engine! ● Decentralizing data and pre-computing answers for reads will take you far ● The best query is no query (cache) Databases Sharding Testing Performance and Capacity General ● Try to emulate the real user traffic ● Add micro-pauses to simulate reality ● Cloud-based providers are great for running load generation Applications ● Component testing ○ Test the max volume of each component on a single host ○ Test the max volume of each component on many hosts ○ Calculate host scalability, ie: “+1 host = +80% more traffic” ● Feature capacity ○ Test the impact of each feature if not separate Testing Performance and Capacity Databases ● Replay real user traffic on real backups ● Load test tools: Linkbench, Sysbench, TPCC, JMeter, etc ● Single feature/query testing ○ Understand host capacity per feature, eg: “2000 user login queries/sec per db replica” ● Know your slowest query! Development-time Questions General ● What does the app do? ● If I break X, what happens? ● Are connections to data stores “pooled”? Replicas ● Can the app use replicas (with possible lag)? ○ Tip: start early, deploy replication from the start ● Can we Add/Remove replicas without disruption? Sharding ● Can the app understand shards/partitions? ● How is data balanced post-sharding? ● Are there cross-shard references? Development-time Questions Caching ● What data can be cached? ● Will an change be read immediately? ○ Can we pre-cache this change? ● When should the cache delete an item? ○ Can we set TTLs on our keys? ● How do we add/remove cache servers easily? Knowing Your App If you see… ● The app is write heavy ○ Remove overhead from immediate write path ○ Batch writes if possible ● The app is read heavy ○ Reduce scans/operations from the read path (index, etc) ○ Add as many replicas (slave/secondary) as needed ● The app queries for counts often, ie: # of items, friends, etc ○ Move count-queries to incremented in-memory counters ○ Or, create an index for the count query ● The app uses references or joins often ○ Consider decentralising the data (with fan-out updates) Themes ● Make all features, apps, databases elastic ● Request Flow ○ Make the heavy workload easy / make the light workload hard ○ Move graph updates to background (queues, async, etc) ○ Move ‘counts’ to counters ● Caching ○ Cheaper/faster to access than DB ○ Try to cache before anyone reads data ● Queues ○ Great for replicating events while simplifying update ○ Great for batching changes ● Monitor everything! Try Percona Monitoring and Management! Join us at Percona Live Europe When: October 3-5, 2016 Where: Amsterdam, Netherlands The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies. ● Get briefed on the hottest topics ● Learn about building and maintaining high-performing deployments ● Listen to technical experts and top industry leaders Use promo code “WebinarPLAM16” and receive €15 off the current registration price! Sponsorship opportunities available as well here. Questions? Thanks for joining! Be sure to checkout the Percona Blog for more technical blogs and topics!