Download slides - People.csail.mit.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
On the Data Path Performance of
Leaf-Spine Datacenter Fabrics
Mohammad Alizadeh
Joint with: Tom Edsall
1
Datacenter Networks
 Must complete transfers or “flows” quickly
need very high throughput, very low latency
1000s of server ports
web
app
cache
db
mapreduce
HPC
monitoring
2
Leaf-Spine DC Fabric
Approximates ideal output-queued switch
Spine
switches
≈
Leaf
switches
• How close is Leaf-Spine to ideal OQ switch?
H1
H2
H3
H4
H5
H6
H7
H8
H9
H1
H2
H3
H4
H5
H6
H7
H8
H9
• What impacts its performance?
– Link speeds, oversubscription, buffering
3
Methodology
• Widely deployed mechanisms
– TCP-Reno+Sack
– DropTail (10MB shared buffer per switch)
– ECMP (hash-based) load balancing
• OMNET++ simulations
– 100×10Gbps servers (2-tiers)
– Actual Linux 2.6.26 TCP stack
Metric:
Flow completion time
• Realistic workloads
– Bursty query traffic
– All-to-all background traffic: web search, data mining
4
Impact of Link Speed
Three non-oversubscribed topologies:
20×10Gbps
Uplinks
5×40Gbps
Uplinks
2×100Gbps
Uplinks
20×10Gbps
Downlinks
20×10Gbps
Downlinks
20×10Gbps
Downlinks
5
Impact of Link Speed
Avg FCT: Large (10MB,∞) background flows
FCT (normalized to optimal)
20
18
16
Better
14
OQ-Switch
12
20x10Gbps
10
5x40Gbps
8
6
2x100Gbps
4
2
• 40/100Gbps
fabric: ~ same FCT as OQ
0
30
40
50
60
70
80
• 10Gbps fabric: FCTLoad
up(%)40% worse than OQ 6
Intuition
Higher speed links improve ECMP efficiency
20×10Gbps
Uplinks
2×100Gbps
Uplinks
Prob of 100% throughput = 3.27%
1 2
20
Prob of 100% throughput = 99.95%
11×10Gbps flows
(55% load)
1
2
7
Impact of Buffering
Where do queues build up?
2.5:1 oversubscribed
fabric with large buffers
Spine port
1
0.9
CDF
60% load
0.8
0.7
Leaf (Front-panel Ports)
Leaf (Fabric Ports)
Spine Ports
24% load
• Leaf
fabric
ports
≈ Spine port queues
0
2
4
6queues
8
Leaf
Leaf
Occupancy
(MBytes)
– 1:1Buffer
port
correspondence;
same speed & load
front-panel port fabric port
8
Impact of Buffering
Where are large buffers more effective for Incast?
Better
QCT (normalized to optimal)
Avg Query Competion Time
100MB-Leaf,
100MB-Spine
80
70
60
50
40
30
20
10
0
2MB-Leaf,
2MB-Spine
2MB-Leaf,
100MB-Spine
100MB-Leaf,
2MB-Spine
• Larger 30buffers better
at Leaf
50
70 than Spine
Load (%)
9
Summary
• 40/100Gbps fabric + ECMP ≈ OQ-switch;
some performance loss with 10Gbps fabric
• Buffering should generally be consistent in
Leaf & Spine tiers
• Larger buffers more useful in Leaf than
Spine for Incast
10
11