Download Black Friday and Cyber Monday: Best Practices for Your E

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
Black Friday and Cyber Monday:
Best Practices for Your
E-Commerce Database
Tim Vaillancourt
Sr. Technical Operations Architect
@ Percona
Agenda
● Synchronous versus Asynchronous Applications
● Scaling a Synchronous/Latency-sensitive Application
● Scaling an Asynchronous Application
● Efficient Usage of Data at Scale
○ Secondary/Slave Hosts
○ Caching
○ Queuing
● Efficient Usage of Data at Scale
○ Moving Expensive Work
○ Caching Techniques
○ Counters and In-memory Stores
○ Connection Pooling
Agenda
● Scaling Out (Horizontal) Tricks
○ Pre-Sharding
○ Kill Switches
○ Limits and Graphs
● Scaling with Hardware (Vertical Scaling)
● Testing Performance and Capacity
● Knowing Your Application and Questions to Ask at Development
Time
● Questions
About Me
● Started at Percona in January 2016
● Experience
○ Web Publishing
■ Big-scale LAMP-based Websites
○ Ecommerce
■ Large Inventory SaaS
○ Gaming
■ DevOps
● 50-100 Microservices
● 5-7+ x Massive Launches / Year
● Design, launch and maintain apps
About Me
■ DBA at EA DICE
● 2 x New Titles
● 5+ x Legacy Titles
○ Technologies
■ MySQL
■ MongoDB
■ Cassandra
■ Redis and Memcached
■ RabbitMQ, Kafka and ActiveMQ
■ Solr and Elasticsearch
■ (Sort of) AWS, HDFS, HBase, Postgres, etc…
Services
Monolith
● One application that does everything
● Example: Chrome, MySQL, huge Python app
Microservice
● Different purposes, pain points, SLA apps are discreet services
● Often easier to scale/troubleshoot
● Reduces risk of outage
● Example: frontend PHP app, messaging app, encoding app, etc
In Practice
● Both can be scaled up and down with the right features
● Microservices offer more flexibility
● Monolith services bring problems at scale
Application Operations
Synchronous
● Blocking operation until success or failure
● Slower requests
● Example: a file uploading app
Asynchronous
● Request and response are separated
● Fast response time back to user/application
● Example: a social media site
Slow Operations
● Can cause pileups in a tiered system
Applications
Synchronous
● Pros: less code, always the right answer
● Cons: blocking operations and poorer efficiency
● Example: a file uploading app
Latency/Integrity Sensitive
● Pros: always the right answer
● Cons: less scalability tricks available
● Example: a stock trading app that cannot accept “slave lag”
Asynchronous
● Pros: light operations and more scalability
● Cons: eventual consistency (and sometimes more code)
● Example: a social media site
Types of Data Designs
Decentralised
● Data is duplicated in several places
● Pros: lighter to read, decreased locking, easy to shard
● Cons: increased storage space, extra duplication effort
Centralised
● Data is kept in one (or few) places and referenced
● Pros: less storage, one source-of-truth
● Cons: locking, inefficiencies, sharding issues
Balancing Request Impact
Read-focused Apps
● Benefit from
○ Values pre-computed at write/change-time
○ Indices and/or few “scans” for data
○ No/few JOINs/operations to get result
Write-focused Apps
● Benefit from
○ No pre-computing of values (compute at read-time)
○ No/few indices to update
○ Insert/Append > Update
○ Reads: compute read summaries with replicas, add indices
to secondaries only, etc
Queuing Updates
Event Metadata
● Example: “UserX has the new top score!”
● Without Queue example
○ Update Top Score in Database(s)
○ Send Email to Friends
○ Post to Facebook Page
○ Update cache
○ ...
● With Queue example
○ Add event to queue ‘topscore’
○ Apps read queue
Queuing Updates
Update Buffering
● Scenario: there is a high rate of updates to buffer
● Queue-based example
○ App adds to update buffer (queue)
○ Worker app works from the bottom of buffer
● Queue Operational Benefits
○ Spikes in traffic
○ Backend downtime
○ Communication bus
Scaling Sync./Latency-Sensitive Apps
●
●
●
●
●
Rethink the Flow Using Async
Use lots of database RAM
Shard the database
Reduce impact of request flow
Apache Cassandra
○ Synchronous
○ Very write optimized
● Percona XtraDB Cluster, NDB
● Use memory-based storage
○ Queue persistence to database
Efficient Usage of Data at Scale
Expensive DB Work
● Focus on lightweight user-facing operations
● Move aggregations/summaries/reporting to background
● Use replicas for expensive jobs
● Avoid or reduce (maybe cache) “JOINs”
● Enable and monitor metrics
○ MySQL
■ log_queries_not_using_indexes
○ MongoDB
■ Enable operationProfiling
○ Review metrics and improve!
○ Percona Monitoring and Management
Efficient Usage of Data at Scale
Caching / In-Memory Stores
● Alleviates load from database
● Very fast lookups
● Low connection overhead
○ MySQL connection buffers: ~1MB+
○ MongoDB connection buffers: ~1MB
○ Redis or Memcache connection buffers: 0-limit/infinity**
● Server-Side
○ Hit/Miss Caching
■ If something is not in the cache: find + add it. TTL expiry
○ Inline/Preemptive Caching
■ Update/Delete cache data at change time/preemptively
Efficient Usage of Data at Scale
Caching / In-Memory Stores (continued)
○ Client-Side
■ Cache client data in the client app/browser/etc
○ In-memory Stores
■ Memcached
■ Redis
■ Percona Server for MongoDB with Memory Engine :)
○ Use TTLs to trim data
Efficient Usage of Data at Scale
Storing Numerical Counters and Stats
● Offload to in-memory stores
○ Incremented/decremented counters
○ Aggregations, summaries, counts
● Count-style Queries to Counters
○ Increment counter at request/change time
○ Read counter value at read-request time
○ Or, try to use an index
Efficient Usage of Data at Scale
Connection Pooling
● Removes 3-way TCP “handshake” from request (more w/SSL)
● Reduces threading overhead on databases
● Proxies on App server localhost/loopback
○ Reduces 1 x TCP ‘hop’, ie: faster connect time
○ Can create a LOT of DB connections with many app servers
Efficient Usage of Data at Scale
Connection Pooling (continued)
● MySQL Proxies
○ ProxySQL
○ HAProxy
○ Maxscale
○ Others…
● MongoDB Proxies
○ Mongos (sharding) process
● Proxy-on-Localhost or direct is fastest
Virtualization, Containers, etc
Virtualization
● Pretends to be a real computer from BIOS up
● OS + Software run under a hypervisor layer
● Pros
○ Full hardware-level emulation, eg: CentOS, Redhat, Win 10
○ Automation of platform (sometimes)
● Cons
○ Emulation overhead
○ Slow boot-up time
○ Lots of OSs to update
Virtualization, Containers, etc
Containers (cgroups, jails)
● Several can run inside a single operating system and kernel
● Offers controls to limit resources like RAM, CPU time, etc
● Pros
○ Low overhead
○ Container creation is very fast
Virtualization, Containers, etc
Mesos, Kubernetes, etc
● Make a lot of servers distribute work, containers, etc
● Apache Mesos: “Distributed systems kernel”
○ Agent on every host and manager servers give out work
● Kubernetes
Virtualization, Containers, etc
Many Processes per Host
● Run un-related processes on hosts
● Add/remove from load balancers
● Not advised for disk-bound or high-bandwidth apps
Scaling Out Tricks
Sharding
● Techniques
○ Modulus
■ Even distribution of keys
■ Hard to reshape data
○ Map-based
■ 1-to-1 shard mapping using another table, config, etc
■ Easy to reshape data
● Launch with many shards in advance
○ 1-4 MySQL/MongoDB Instance/host
○ 1 MySQL/MongoDB Instance/host, 4 x databases as shards
○ 1 MySQL/MongoDB Instance/host, small hardware
Scaling Out Tricks
Sharding
Modulus:
Mapping:
Scaling Out Tricks
Hardware
● Have a strategy to add/remove capacity quickly
○ Cloud Instances
○ Mesos/Kubernetes
○ Automation
● Use cheap application servers for in-memory stores and apps
● Launch with lots of RAM, scale down post-launch
Scaling Out Tricks
Elasticity
● Ensure there is a way to add/remove hosts, examples:
○ Load Balancers
■ Good health-checks are important
○ Application Configs
■ File
■ Database
■ Zookeeper
Scaling Out Tricks
At Launch...
● Scale-out
○ Keep spare servers online, partially configured
○ Launch with extra database replicas (slave/secondary)
○ Monitor usage and remove extra hardware post-launch
○ Monitor and adjust capacity
● Scale-up
○ Launch with lots of RAM
● Traffic Control
○ Launch one region at a time
○ Launch with rate limits
Scaling Out Tricks
Application “Kill switches”
● A switch to disable certain app features/functions
● Useful when there is:
○ Too much traffic/scale-up
○ DDoS
○ A maintenance
Scaling Out Tricks
Limiting Graph Structures
● “Friends” / ”Followers” features are often graphs
● If Katy Perry or Barack Obama used your “friends” feature…
● Limit the size of graphs, or queue events for fan-out updating
Scaling Out Tricks
Batching and Parallel Work
● Do large queries in parallel
○ Modern CPUs have many cores (2, 4, 8+)
○ 1 connection = 1 thread = 1 CPU core
● Batch inserts/updates
○ 1 x update with 1000 items > 1000 x updates with 1 item
Scaling Up Tricks
● Test provider turn-around time on hardware upgrading
● Test application performance on improved hardware in advance
● Scale up only resources needed
Databases
General
● Monitoring/reviewing slow queries reduces most inefficiencies
● More memory will reduce disk requests
● SSDs will reduce disk request time
● Proper database and kernel tunings will help further
○ Linux has very inefficient defaults!
● Try to use real local-disks, not EBS, NFS, etc
Queries
● Don’t try to make MySQL/MongoDB a queue or search engine!
● Decentralizing data and pre-computing answers for reads will
take you far
● The best query is no query (cache)
Databases
Sharding
Testing Performance and Capacity
General
● Try to emulate the real user traffic
● Add micro-pauses to simulate reality
● Cloud-based providers are great for running load generation
Applications
● Component testing
○ Test the max volume of each component on a single host
○ Test the max volume of each component on many hosts
○ Calculate host scalability, ie: “+1 host = +80% more traffic”
● Feature capacity
○ Test the impact of each feature if not separate
Testing Performance and Capacity
Databases
● Replay real user traffic on real backups
● Load test tools: Linkbench, Sysbench, TPCC, JMeter, etc
● Single feature/query testing
○ Understand host capacity per feature, eg: “2000 user login
queries/sec per db replica”
● Know your slowest query!
Development-time Questions
General
● What does the app do?
● If I break X, what happens?
● Are connections to data stores “pooled”?
Replicas
● Can the app use replicas (with possible lag)?
○ Tip: start early, deploy replication from the start
● Can we Add/Remove replicas without disruption?
Sharding
● Can the app understand shards/partitions?
● How is data balanced post-sharding?
● Are there cross-shard references?
Development-time Questions
Caching
● What data can be cached?
● Will an change be read immediately?
○ Can we pre-cache this change?
● When should the cache delete an item?
○ Can we set TTLs on our keys?
● How do we add/remove cache servers easily?
Knowing Your App
If you see…
● The app is write heavy
○ Remove overhead from immediate write path
○ Batch writes if possible
● The app is read heavy
○ Reduce scans/operations from the read path (index, etc)
○ Add as many replicas (slave/secondary) as needed
● The app queries for counts often, ie: # of items, friends, etc
○ Move count-queries to incremented in-memory counters
○ Or, create an index for the count query
● The app uses references or joins often
○ Consider decentralising the data (with fan-out updates)
Themes
● Make all features, apps, databases elastic
● Request Flow
○ Make the heavy workload easy / make the light workload hard
○ Move graph updates to background (queues, async, etc)
○ Move ‘counts’ to counters
● Caching
○ Cheaper/faster to access than DB
○ Try to cache before anyone reads data
● Queues
○ Great for replicating events while simplifying update
○ Great for batching changes
● Monitor everything! Try Percona Monitoring and Management!
Join us at Percona Live Europe
When: October 3-5, 2016
Where: Amsterdam, Netherlands
The Percona Live Open Source Database Conference is a great event for users of any level
using open source database technologies.
● Get briefed on the hottest topics
● Learn about building and maintaining high-performing deployments
● Listen to technical experts and top industry leaders
Use promo code “WebinarPLAM16” and receive €15 off the current registration price!
Sponsorship opportunities available as well here.
Questions?
Thanks for joining!
Be sure to checkout
the Percona Blog for more
technical blogs and topics!