Download slides ppt

The Internet’s architecture for managing congestion Damon Wischik, UCL www.wischik.com/damon Some Internet History • 1974: First draft of TCP/IP [“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn] • 1983: ARPANET switches on TCP/IP • 1986: Congestion collapse • 1988: Congestion control for TCP [“Congestion avoidance and control”, Van Jacobson] “A Brief History of the Internet”, the Internet Society End-to-end control Internet congestion is controlled by the end-systems. The network operates as a dumb pipe. [“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] request User Server (TCP) End-to-end control Internet congestion is controlled by the end-systems. The network operates as a dumb pipe. [“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] Server (TCP) User data End-to-end control Internet congestion is controlled by the end-systems. The network operates as a dumb pipe. [“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] acknowledgements Server (TCP) User data The TCP algorithm, running on the server, decides how fast to send data. traffic rate [0-100 kB/sec] TCP time [0-8 sec] if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } How TCP shares capacity individual flow bandwidths available bandwidth sum of flow bandwidths time Motivation: buffer size • Internet routers have buffers, to accomodate bursts in traffic. • How big do the buffers need to be? – 3 GByte? Rule of thumb—what Cisco does today – 300 MByte? [Appenzeller, Keslassy, McKeown, 2004 ] – 30 kByte? • Large buffers are unsustainable: – Data volumes double every 10 months – CPU speeds double every 18 months – Memory access speeds double every 10 years Motivation: TCP’s teleology [Kelly, Maulloo, Tan, 1998] U(x) P(y,C) • Consider several TCP flows sharing a single link • Let xr be the mean bandwidth of flow r [pkts/sec] Let y be the total bandwidth of all flows [pkts/sec] Let C be the total available capacity [pkts/sec] • TCP and the network act so as to solve maximise r U(xr) - P(y,C) over xr0 where y=r xr x C y U(x) Bad teleology severe penalty for allocating too little bandwidth little extra valued attached to highbandwidth flows x Bad teleology U(x) flows with large RTT are satisfied with little bandwidth flows with small RTT want more bandwidth x P(y,C) Bad teleology no penalty unless links are overloaded C y TCP’s teleology U(x) P(y,C) • The network acts as if it’s trying to solve an optimization problem • Is this what we want the Internet to optimize? • Does it even succeed in performing the optimization? x C y Synchronization Desynchronized TCP flows: Synchronized TCP flows: aggregate traffic rate individual flow rates – aggregate traffic is smooth – network solves the optimization + + = – aggregate traffic is bursty – network oscillates about the optimum + + = time Synchronization Desynchronized TCP flows: Synchronized TCP flows: aggregate traffic rate individual flow rates – aggregate traffic is smooth – network solves the optimization + + = – aggregate traffic is bursty – network oscillates about the optimum + + = time TCP traffic model • When there are many TCP flows, the aggregate traffic rate xt varies smoothly, according to a differential equation [Misra, Gong, Towsley, 2000] aggregate traffic rate • The equation involves – pt, the packet loss probability at time t, – RTT, the average round trip time desynchronized time synchronized traffic rate [0-100 kB/sec] TCP time [0-8 sec] if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } Queue model • How does packet loss probability pt depend on buffer size? • There are two families of answers, depending on queueing delay: – Small buffers (queueing delay « RTT) – Large buffers (queueing delay  RTT) 2000 20 20 200 10 20 5 5 20 20 200 15 time 0 0 time [0-5 sec] 42 43 44 45 20 41 15 40 queueing 2000 delay 0.19 ms 15 20 15 200 delay queueing 1.9 ms 10 val1 15 10 5 0 queue size [0-15 pkt] 20 queueing 20 delay 19 ms 10 0 0 5 5 10 val1 15 15 Small buffers val1 10 5 0 20 15 10 5 0 0 5 10 As the optical fibre’s line rate increases – queue size fluctuates more and more rapidly – queue size distribution does not change 20 40 41 42 43 44 45 time (it depends only on link utilization, not on line rate) 40 41 42 43 time 44 45 Large buffers (queueing delay 200 ms) queue val1size 0 50 100 150 pkt] [0-160 queueapprox 40 42 44 46 time 50 time 48[0-10 sec] • When xt<C the queue size is small (C=line rate) • No packet drops, so TCP increases xt Large buffers (queueing delay 200 ms) queue val1size 0 50 100 150 pkt] [0-160 queueapprox 40 42 44 46 time 50 time 48[0-10 sec] • When xt<C the queue size is small (C=line rate) • No packet drops, so TCP increases xt • When xt>C the queue fills up and packets begin to get dropped Large buffers (queueing delay 200 ms) queue val1size 0 50 100 150 pkt] [0-160 queueapprox 40 42 44 46 time 50 time 48[0-10 sec] • When xt<C the queue size is small (C=line rate) • No packet drops, so TCPs increases xt • When xt>C the queue fills up and packets begin to get dropped • TCPs may ‘overshoot’, leading to synchronization Large buffers (queueing delay 200 ms) queue val1size 0 50 100 150 pkt] [0-160 queueapprox 40 42 44 46 time 50 time 48[0-10 sec] • Drop probability depends on both traffic rate xt and queue size qt Analysis • Write down differential equations – for aggregate TCP traffic rate xt – for queue dynamics and loss prob pt taking account of buffer size • Calculate – average link utilization – average queue occupancy/delay – extent of synchronization and consequent loss of utilization, and jitter [Gaurav Raina, PhD thesis, 2005] Stability/instability analysis traffic rate xt/C 1.4 1.2 0.8 0.6 20 40 60 80 100 20 40 60 80 100 1.4 1.2 0.8 0.6 • For some values of C*RTT, the dynamical system is stable • For others it is unstable and there are oscillations (i.e. the flows are partially synchronized) • When it is unstable, we can calculate the amplitude of the oscillations time Instability plot traffic intensity x/C 0.5 -1 log10 of pkt loss probability p 1 extent of oscillations in x/C -2 -3 -4 queue equation 1.5 2 TCP throughput equation Instability plot traffic intensity x/C 0.5 1 1.5 2 -1 C*RTT=4 pkts log10 of pkt loss probability p -2 -3 C*RTT=20 pkts -4 C*RTT=100 pkts Alternative buffer-sizing rules Intermediate buffers buffer = bandwidth*delay / sqrt(#flows) or Large buffers buffer = bandwidth*delay b25 b100 b400 -1 -2 Large buffers with AQM -3 buffer=bandwidth*delay*{¼,1,4} -4 0.5 b10 1 1.5 b20 b50 -1 -2 Small buffers -3 buffer={10,20,50} pkts -4 0.5 b50 1 1.5 b1000 -1 -2 Small buffers, ScalableTCP p -3 -4 buffer={50,1000} pkts [Vinnicombe 2002] [T.Kelly 2002] -5 -6 0.5 1 1.5 Conclusion • The network acts to solve an optimization problem. – We can choose which optimization problem, by choosing the right buffer size & by changing TCP’s code. • It may or may not attain the solution – In order to make sure the network is stable, we need to choose the buffer size & TCP code carefully. Prescription • ScalableTCP in end-systems need to persuade Microsoft, Linus • Much smaller buffers in routers need to persuade BT/AT&T ScalableTCP gives more weight to highbandwidth flows. And it’s been shown to be stable. U(x) P(y,C) With small buffers, the network likes to run with slightly lower utilization, hence lower delay x C y

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides ppt