Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CamCube Rethinking the Data Center Cluster Paolo Costa [email protected] joint work with Austin Donnelly, Greg O’Shea, Antony Rowstron (MSRC) Hussam Abu-Libdeh (Intern, Cornell), Simon Schubert (Intern, EPFL) Paolo Costa CamCube - Rethinking the Data Center Cluster 2 A New Software Stack Paolo Costa CamCube - Rethinking the Data Center Cluster 3 A New Software Stack Dryad/DryadLINQ Dremel Paolo Costa CamCube - Rethinking the Data Center Cluster 4 A New Software Stack Dryad/DryadLINQ Network is a critical component Focus of this talk: How to make it easy to design and deploy efficient data center applications Dremel Paolo Costa CamCube - Rethinking the Data Center Cluster 5 Building Data Center Applications is Hard! Abstraction Reality • Application logical topologies • Data center physical topology Tree Dremel Dynamo MapReduce Databus Paolo Costa CamCube - Rethinking the Data Center Cluster 6 Abstraction & Reality Mismatch Paolo Costa CamCube - Rethinking the Data Center Cluster 7 Abstraction & Reality Mismatch Router Switches One logical hop is mapped to multiple physical hops Paolo Costa CamCube - Rethinking the Data Center Cluster 8 Abstraction & Reality Mismatch Router Switches Paolo Costa CamCube - Rethinking the Data Center Cluster 9 Abstraction & Reality Mismatch Router Switches Two disjoint logical paths share some physical links Paolo Costa CamCube - Rethinking the Data Center Cluster 10 Abstraction & Reality Mismatch Router Switches Paolo Costa CamCube - Rethinking the Data Center Cluster 11 Issue #1: Oversubscription Router Switches Bandwidth gets scarce as you move up the tree Paolo Costa Locality is key to performance CamCube - Rethinking the Data Center Cluster 12 Issue #2: Path collision The network allocates paths independently Applications cannot modify the way packets are routed Paolo Costa CamCube - Rethinking the Data Center Cluster 13 Addressing These Issues… • Oversubscription: Fat-tree[SIGCOMM’08], VL2[SIGCOMM’09], … • Path collision: Hedera[NSDI’10], MPTCP[SIGCOMM’11], SPAIN[NSDI’10], … • TCP Incast: DCTCP [SIGCOMM’10], ICTCP[CoNEXT’10], FDS[OSDI’12], … • Traffic prioritization: Orchestra [SIGCOMM’11], D2TCP[SIGCOMM’11], … • Fair sharing: Seawall [NSDI’11], FairCloud [SIGCOMM’12], … Paolo Costa CamCube - Rethinking the Data Center Cluster 14 Applications & Network Gap The network is a black box for applications (and vice versa) Paolo Costa CamCube - Rethinking the Data Center Cluster 15 Applications & Network Gap Applications perspective • Applications only see IP addresses − Hard to infer locality & congestion • No control on packet routing − Point-to-point only • Need to reverse-engineer the network Why slow? ? 10.0.2.3 Paolo Costa 10.0.1.4 CamCube - Rethinking the Data Center Cluster 16 Applications & Network Gap Applications perspective Network Perspective • Applications only see IP addresses • The network only sees packets • No control on packet routing • No insights about application behaviour • Need to reverse-engineer the network • Has to infer application patterns − Hard to infer locality & congestion − Point-to-point only Why slow? ? ? Long vs. short flows? 10.0.2.3 Paolo Costa Are these related? 10.0.1.4 CamCube - Rethinking the Data Center Cluster 17 Applications & Network Gap Applications perspective Network Perspective • Applications only see IP addresses • The network only sees packets • No control on packet routing • No insights about application behaviour • Need to reverse-engineer the network • Has to infer application patterns − Hard to infer locality & congestion − Point-to-point only Why slow? ? ? Long vs. short flows? 10.0.2.3 Paolo Costa Are these related? 10.0.1.4 CamCube - Rethinking the Data Center Cluster 18 Internet & Data Centers • This is due to how the Internet was designed… − …but data centers are not mini-Internets Internet • Multiple administration domains • Heterogeneous HW and network Strict layer isolation • Topology not known • Malicious software Paolo Costa CamCube - Rethinking the Data Center Cluster 19 Internet & Data Centers • This is due to how the Internet was designed… − …but data centers are not mini-Internets Internet Data Centers • Multiple administration domains • Single administration domain • Heterogeneous HW and network • Homogenous HW and network − x86 and Ethernet • Topology not known • Topology known • Malicious software • Trusted components Paolo Costa − and can be customised − e.g., using virtualization CamCube - Rethinking the Data Center Cluster 20 Internet & Data Centers • This is due to how the Internet was designed… − …but data centers are not mini-Internets Internet Data Centers • Multiple administration domains • Single administration domain • How can we exploit this flexibility Heterogeneous HW and network • Homogenous HW and network to improve efficiency and reduce − x86 andcomplexity? Ethernet • Topology not known • Topology known • Malicious software • Trusted components Paolo Costa − and can be customised − e.g., using virtualization CamCube - Rethinking the Data Center Cluster 21 CamCube How can we design a data center closer to what a distributed systems builder expects? Paolo Costa CamCube - Rethinking the Data Center Cluster 22 CamCube • • How cannetwork we design a data center closerto it Today : The is a given and apps adapt to what a distributed systems CamCube: Adapt the network to builder the apps’expects? needs Paolo Costa CamCube - Rethinking the Data Center Cluster 23 CamCube Physical Ethernet cable Direct-Connect topology How can we design a data center closer Servers are directly interconnected to each other to what a distributed systems builder expects? (no switches / routers) Paolo Costa CamCube - Rethinking the Data Center Cluster 24 CamCube Direct-Connect topology Howconnected can we design a data center A fully mesh topology wouldcloser be ideal Servers are directly interconnected to each other logical topologies systems can be mapped perfectly to All what a distributed builder expects? (no switches / routers) Paolo Costa CamCube - Rethinking the Data Center Cluster 25 CamCube Dynamo Direct-Connect topology A fully connected mesh topology would be ideal Servers are directly interconnected to each other All logical topologies can be mapped perfectly (no switches / routers) Paolo Costa CamCube - Rethinking the Data Center Cluster 26 CamCube Direct-Connect topology A fully connected mesh topology would be ideal Servers are directly interconnected to each other All logical topologies can be mapped perfectly (no switches / routers) Paolo Costa CamCube - Rethinking the Data Center Cluster 27 CamCube Direct-Connect topology Not very scalable Howconnected can we design a data center A fully mesh topology wouldcloser be ideal ServersNode are directly interconnected to each other degree grows linearly with N logical topologies systems can be mapped perfectly to All what a distributed builder expects? (no switches / routers) (high server load and cabling complexity) Paolo Costa CamCube - Rethinking the Data Center Cluster 28 Which topology? • Various options available − Trees, rings, hypercubes, tori, … • Scalable − Node degree is constant (=6) 2D Torus • Fault-tolerant − High degree of multi-path • Easy to wire − Only short links are needed • Trade-off − Increased hop count Paolo Costa CamCube - Rethinking the Data Center Cluster 3D Torus 29 (1,2,1) (1,2,2) Network Visibility y z 10.0.2.3 10.0.1.4 • Limited network visibility • Nodes have (x,y,z) coordinates −Hard to infer server location • IP addresses only −Hard to infer congestion Paolo Costa x − Easy to understand locality • Servers have full visibility on the status of network links CamCube - Rethinking the Data Center Cluster 30 Packet Routing • Single routing protocol − Point-to-point only Paolo Costa • Servers can intercept, process, and forward packets − multiple custom routing protocols − e.g., multicast, multipath CamCube - Rethinking the Data Center Cluster 31 Packet Processing • Application-agnostic packet processing − Typically header-only − e.g., OpenFlow Paolo Costa • Application-specific packet processing − Servers understand the application semantics − E.g., caching, aggregation CamCube - Rethinking the Data Center Cluster 32 CamCube Services • Several services have been implemented on top of CamCube, including: • CamKey − Key-value store • Camdoop − MapReduce-like system • CamGraph − Graph processing engine • TCP/IP service − Enables running unmodified TCP applications Paolo Costa CamCube - Rethinking the Data Center Cluster 33 CamCube Services • Several services have been implemented on top of CamCube, including: • CamKey − Key-value store • Camdoop − MapReduce-like system • CamGraph − Graph processing engine • TCP/IP service − Enables running unmodified TCP applications Paolo Costa CamCube - Rethinking the Data Center Cluster 34 (2,2,0) Key-based Routing y • Packets are routed based on the key rather than server address • Inspired by Distributed Hash Tables (DHTs) (1,2,0) z − The (x,y,z)coordinates define a key-space • 160-bit keys are expressed as (x,y,z,w) x (2,1,0) − If alive, (x,y,z) is the server responsible for − Otherwise, keys are re-mapped to 1-hop neighbors based on w • Example − (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … Paolo Costa CamCube - Rethinking the Data Center Cluster 35 CamKey • Reliable high-performance key-value store − Combination of BigTable + memcached Two components: • Replicated store − Ensures fault tolerance • Caching service − Provides high performance Paolo Costa CamCube - Rethinking the Data Center Cluster 36 Replicated Store hash(ID) = e689eb3… = (2,2,0,27) Data objects IDs are hashed using SHA-1 and the result is interpreted as 4D coordinates Paolo Costa CamCube - Rethinking the Data Center Cluster 37 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) (2,2,0) The primary replica is stored at the server responsible for the key Paolo Costa CamCube - Rethinking the Data Center Cluster 38 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … (2,1,0) The first secondary replica is stored at the server that will become responsible for the key if the primary fails Paolo Costa CamCube - Rethinking the Data Center Cluster 39 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … (1,2,0) The second secondary replica is stored on the next server on the list and so on Paolo Costa CamCube - Rethinking the Data Center Cluster 40 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … High-locality Secondary replicas are 1-hop neighbors Disjoint paths can be used Paolo Costa CamCube - Rethinking the Data Center Cluster 41 Replicated Store Route to (2,2,0,27) Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … Client Transparency Clients do not need to know the replica identity Key-based routing is used to deliver packets Paolo Costa CamCube - Rethinking the Data Center Cluster 42 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … Client Transparency Clients do not need to know the replica identity Key-based routing is used to deliver packets Paolo Costa CamCube - Rethinking the Data Center Cluster 43 Replicated Store Primary replica hash(ID) = e689eb3… = (2,2,0,27) Secondary replica (2,2,0,27) -> (2,2,0), (2,1,0), (1,2,0), … Client Transparency Clients do not need to know the replica identity Key-based routing is used to deliver packets Paolo Costa CamCube - Rethinking the Data Center Cluster 44 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… For each key, we generate c additional keys that represent the location of caches Paolo Costa CamCube - Rethinking the Data Center Cluster 45 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… These cache keys are assigned to servers using the usual mapping function Paolo Costa CamCube - Rethinking the Data Center Cluster 46 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… When a server lookups a key, the path is chosen so as to pass through the closest cache Paolo Costa CamCube - Rethinking the Data Center Cluster 47 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… When a server lookups a key, the path is chosen so as to pass through the closest cache Paolo Costa CamCube - Rethinking the Data Center Cluster 48 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and the response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 49 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 50 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 51 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 52 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 53 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 54 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 55 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… On a cache miss, the lookup request is forwarded to the primary replica and response is cached on the way back Paolo Costa CamCube - Rethinking the Data Center Cluster 56 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… Next requests for the same key are intercepted on-path and the associated value is returned Paolo Costa CamCube - Rethinking the Data Center Cluster 57 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… Next requests for the same key are intercepted on-path and the associated value is returned Paolo Costa CamCube - Rethinking the Data Center Cluster 58 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… Next requests for the same key are intercepted on-path and the associated value is returned Paolo Costa CamCube - Rethinking the Data Center Cluster 59 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches f(2,2,0,27) -> (1,1,0,27), (3,1,0,27), (1,3,0,27), (3,3,0,27),… Next requests for the same key are intercepted on-path and the associated value is returned Paolo Costa CamCube - Rethinking the Data Center Cluster 60 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches Write operations always go to the primary replica and caches are invalidated Paolo Costa CamCube - Rethinking the Data Center Cluster 61 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches Write operations always go to the primary replica and caches are invalidated Paolo Costa CamCube - Rethinking the Data Center Cluster 62 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches Write operations always go to the primary replica and caches are invalidated Paolo Costa CamCube - Rethinking the Data Center Cluster 63 Caching Service Primary replica hash(ID) = e689eb3… = (2,2,0,27) Caches Write operations always go to the primary replica and caches are invalidated Paolo Costa CamCube - Rethinking the Data Center Cluster 64 Evaluation Testbed − 27-server CamCube (3 x 3 x 3) − Quad-core 2.27 Ghz, 12 GB RAM − Six 1 Gbps ports per server − Runtime & services implemented in user-space (C#) Workload: Image store − 9 external servers (up to 150 concurrent requests) − Insert: 1.47 MB average image size − Lookup: 3.55 KB average thumbnail size Paolo Costa CamCube - Rethinking the Data Center Cluster 65 Insert Throughput Insert throughput (Gbps) Better 6 switch 5 CamKey 4 switch (no disk) CamKey (no disk) 3 2 1 Worse 0 0 25 50 75 100 125 150 Concurrent insert requests Paolo Costa Load increases CamCube - Rethinking the Data Center Cluster 66 Insert Throughput Insert throughput (Gbps) Better 6 CamKey exploits disjoint paths to create replicas switch 5 CamKey 4 switch (no disk) CamKey (no disk) 3 2 Server bandwidth bounded 1 Worse 0 0 25 50 75 100 Disk I/O bounded 125 150 Concurrent insert requests Paolo Costa Load increases CamCube - Rethinking the Data Center Cluster 67 Lookup Throughput Better 160,000 switch Lookup rate (reqs/s) 140,000 CamKey (disabled cache) 120,000 CamKey 100,000 80,000 60,000 40,000 20,000 Worse 0 0 25 50 75 100 125 150 Concurrent lookup requests Paolo Costa Load increases CamCube - Rethinking the Data Center Cluster 68 Caches reduce hop count Latency 0.83 ms (median) 1.70 ms (95th perc) Lookup Throughput Better 160,000 switch Lookup rate (reqs/s) 140,000 CamKey (disabled cache) 120,000 CamKey 100,000 80,000 60,000 Latency 0.97 ms (median) 2.13 ms (95th perc) 40,000 20,000 Worse Higher hop count 0 0 25 50 75 100 125 150 Concurrent lookup requests Paolo Costa Load increases CamCube - Rethinking the Data Center Cluster 69 Failures Insert throughput (Gbps) 6 5 4 ` 3 A random server fails every 10 s 2 1 CamKey 0 0 Paolo Costa 20 40 60 80 Time (s) CamCube - Rethinking the Data Center Cluster 100 120 140 70 Only 18 servers left Failures Insert throughput (Gbps) 6 5 4 3 A random server fails every 10 s 2 1 CamKey 0 0 Paolo Costa 20 40 60 80 Time (s) CamCube - Rethinking the Data Center Cluster 100 120 140 71 CamCube Services • Several services have been implemented on top of CamCube, including: • CamKey − Key-value store • Camdoop − MapReduce-like system • CamGraph − Graph processing engine • TCP/IP service − Enables running unmodified TCP applications Paolo Costa CamCube - Rethinking the Data Center Cluster 72 MapReduce Input file Chunk 0 Chunk 1 Chunk 2 Intermediate results Final results Map Task Reduce Task Map Task Reduce Task Map Task Reduce Task • Map − Processes input data and generates (key, value) pairs • Shuffle − Distributes the intermediate pairs to the reduce tasks • Reduce − Aggregates all values associated to each key Paolo Costa CamCube - Rethinking the Data Center Cluster 73 Shuffle Phase Intermediate results Split 0 Split 1 Split 2 Map Task Reduce Task Map Task Reduce Task Map Task Reduce Task • Shuffle phase is challenging for data center networks − All-to-all traffic pattern with O(N2) flows • Often a bottleneck for MapReduce jobs − Led to proposals for full-bisection bandwidth Paolo Costa CamCube - Rethinking the Data Center Cluster 74 Data Reduction Input file Split 0 Split 1 Split 2 Final results Intermediate results Map Task Reduce Task Map Task Reduce Task Map Task Reduce Task • The final results are typically much smaller than the intermediate results (e.g., WordCount) • In most Facebook jobs final size is 5.4 % of the intermediate size • In most Yahoo jobs the ratio is 8.2 % Paolo Costa CamCube - Rethinking the Data Center Cluster 75 Data Reduction Input file Split 0 Split 1 Split 2 Final results Intermediate results Map Task Reduce Task Map Task Reduce Task Map Task Reduce Task • The final results are typically much smaller than the intermediate results (e.g., WordCount) • In most Facebook jobs final size is 5.4 % of the How can we exploit intermediate size this to reduce the traffic and the performance ofisthe phase? •improve In most Yahoo jobs the ratio 8.2shuffle % Paolo Costa CamCube - Rethinking the Data Center Cluster 76 Aggregation Tree • We could use aggregation trees to perform multiple steps of aggregation to reduce inter-rack traffic − e.g., rack-level aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 77 Aggregation Tree • We could use aggregation trees to perform multiple steps of aggregation to reduce inter-rack traffic − e.g., rack-level aggregation Paolo Costa CamCube - Rethinking the Data Center Cluster 78 Mapping a tree… … on a traditional topology • Mismatch between logical and physical topology Rack Switch Link shared by all children … on CamCube • 1:1 mapping btw. logical and physical topology • Packets are aggregated on path (=> less traffic) Only one child per link Paolo Costa CamCube - Rethinking the Data Center Cluster 79 Mapping a tree… … on a traditional topology • Mismatch between logical and physical topology Rack Switch Link shared by all children … on CamCube • 1:1 mapping btw. logical and physical topology • Packets are aggregated on path (=> less traffic) Only one child per link Paolo Costa CamCube - Rethinking the Data Center Cluster 80 Mapping a tree… … on a traditional topology • Mismatch between logical and physical topology Rack Switch Link shared by all children … on CamCube • 1:1 mapping btw. logical and physical topology • Packets are aggregated on path (=> less traffic) Only one child per link Paolo Costa CamCube - Rethinking the Data Center Cluster 81 Mapping a tree… … on a traditional topology • Mismatch between logical and physical topology Rack Switch Link shared by all children … on CamCube • 1:1 mapping btw. logical and physical topology • Packets are aggregated on path (=> less traffic) Only one child per link Paolo Costa CamCube - Rethinking the Data Center Cluster 82 Mapping a tree… … on a traditional topology … on CamCube Camdoop • 1:1 mapping btw. logical • Mismatch between logical Improve the performance of the shuffle phase and physical topology and physical topology by reducing the traffic • Packets are aggregated rather than by increasing the bandwidth on path (=> less traffic) Rack Switch Paolo Costa CamCube - Rethinking the Data Center Cluster 83 Workload Parameter • Output size / intermediate size (S) − S=1 (no aggregation) o All map outputs have a disjoint set of keys − S=1/N ≈ 0 (full aggregation) o All map outputs share the same set of keys − We use synthetic workloads to explore Intermediate results Input file different value of S Output results Map Task data size is 22.2 GB (843 o Intermediate MB/server) Reduce Task Split 0 Split 1 Split 2 Paolo Costa Map Task Reduce Task Map Task Reduce Task CamCube - Rethinking the Data Center Cluster 84 Evaluation Time (s) logscale Worse 1000 100 Baseline 10 Camdoop (no agg.) Camdoop Better 1 0 Full aggregation Paolo Costa 0.2 0.4 0.6 Output size/ intermediate size (S) CamCube - Rethinking the Data Center Cluster 0.8 1 No aggregation 85 Impact of in-network aggregation Impact of running on CamCube Running on the switch using TCP Evaluation Time (s) logscale Worse 1000 100 Baseline 10 Camdoop (no agg.) Camdoop Better 1 0 Full aggregation Paolo Costa 0.2 0.4 0.6 Facebook reported Output size/ intermediate size (S) aggregation ratio CamCube - Rethinking the Data Center Cluster 0.8 1 No aggregation 86 Summary • Data centers present both unique challenges and opportunities to network designers • Good time to revisit previous assumptions and rethink application and protocol design • CamCube − Enables applications to “control” the network − Removes distinction between computation and network devices Paolo Costa CamCube - Rethinking the Data Center Cluster 87