Download slides - People.csail.mit.edu

On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1 Datacenter Networks  Must complete transfers or “flows” quickly need very high throughput, very low latency 1000s of server ports web app cache db mapreduce HPC monitoring 2 Leaf-Spine DC Fabric Approximates ideal output-queued switch Spine switches ≈ Leaf switches • How close is Leaf-Spine to ideal OQ switch? H1 H2 H3 H4 H5 H6 H7 H8 H9 H1 H2 H3 H4 H5 H6 H7 H8 H9 • What impacts its performance? – Link speeds, oversubscription, buffering 3 Methodology • Widely deployed mechanisms – TCP-Reno+Sack – DropTail (10MB shared buffer per switch) – ECMP (hash-based) load balancing • OMNET++ simulations – 100×10Gbps servers (2-tiers) – Actual Linux 2.6.26 TCP stack Metric: Flow completion time • Realistic workloads – Bursty query traffic – All-to-all background traffic: web search, data mining 4 Impact of Link Speed Three non-oversubscribed topologies: 20×10Gbps Uplinks 5×40Gbps Uplinks 2×100Gbps Uplinks 20×10Gbps Downlinks 20×10Gbps Downlinks 20×10Gbps Downlinks 5 Impact of Link Speed Avg FCT: Large (10MB,∞) background flows FCT (normalized to optimal) 20 18 16 Better 14 OQ-Switch 12 20x10Gbps 10 5x40Gbps 8 6 2x100Gbps 4 2 • 40/100Gbps fabric: ~ same FCT as OQ 0 30 40 50 60 70 80 • 10Gbps fabric: FCTLoad up(%)40% worse than OQ 6 Intuition Higher speed links improve ECMP efficiency 20×10Gbps Uplinks 2×100Gbps Uplinks Prob of 100% throughput = 3.27% 1 2 20 Prob of 100% throughput = 99.95% 11×10Gbps flows (55% load) 1 2 7 Impact of Buffering Where do queues build up? 2.5:1 oversubscribed fabric with large buffers Spine port 1 0.9 CDF 60% load 0.8 0.7 Leaf (Front-panel Ports) Leaf (Fabric Ports) Spine Ports 24% load • Leaf fabric ports ≈ Spine port queues 0 2 4 6queues 8 Leaf Leaf Occupancy (MBytes) – 1:1Buffer port correspondence; same speed & load front-panel port fabric port 8 Impact of Buffering Where are large buffers more effective for Incast? Better QCT (normalized to optimal) Avg Query Competion Time 100MB-Leaf, 100MB-Spine 80 70 60 50 40 30 20 10 0 2MB-Leaf, 2MB-Spine 2MB-Leaf, 100MB-Spine 100MB-Leaf, 2MB-Spine • Larger 30buffers better at Leaf 50 70 than Spine Load (%) 9 Summary • 40/100Gbps fabric + ECMP ≈ OQ-switch; some performance loss with 10Gbps fabric • Buffering should generally be consistent in Leaf & Spine tiers • Larger buffers more useful in Leaf than Spine for Incast 10 11

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - People.csail.mit.edu