Download Document

Congestion control Lecture 6 CS 653 Why congestion control? Causes/costs of congestion: scenario 1  two senders, two receivers  one router, infinite buffers  no retransmission Host A Host B lout lin : original data unlimited shared output link buffers  large delays when congested  throughput staurates Causes/costs of congestion: scenario 2  one router, finite buffers  sender retransmission of lost packet Host A Host B lin : original data l'in : original data, plus retransmitted data finite shared output link buffers lout Causes/costs of congestion: scenario 2 (goodput) = l out in  “perfect” retransmission when only loss  always:  l l > lout in retransmission of delayed (not lost) packet makes l in (than perfect case) for same l out R/2 R/2 larger R/2 lin a. R/2 lout lout lout R/3 lin b. R/2 R/4 lin R/2 c. “costs” of congestion:  more work (retransmission) for given “goodput”  unneeded retransmissions: link carries multiple copies of pkt Causes/costs of congestion: scenario 3  four senders  multihop paths  timeout/retransmit Q: what happens as l in and l increase ? in Host A lin : original data l'in : original data, plus retransmitted data finite shared output link buffers Host B lout Causes/costs of congestion: scenario 3 H o s t A l o u t H o s t B Another “cost” of congestion:  when packet dropped, any “upstream transmission capacity used for that packet was wasted! Two broad approaches towards congestion control End-end congestion control: Network-assisted congestion control:  no explicit feedback from  routers provide feedback to endhosts network  single bit indicating  congestion inferred from congestion (SNA, end-system observed DECbit, ATM, TCP/IP ECN) loss, delay  explicit rate sender  approach taken by TCP should send  recent proposals [XCP] [RCP] revisit ATM ideas TCP congestion control Components of TCP congestion control  Slow start  Multiplicatively increase (double) window  Congestion avoidance  Additively increase (by 1 MSS) window  Loss  Multiplicatively decrease (halve) window  Timeout Set cwnd to 1 MSS  Multiplicatively increase (double) retransmission timeout upon each further consecutive loss  Retransmission timeout estimation  Calculate EstimatedRTT using moving average EstimatedRTTi = (1- )*EstimatedRTTi-1 + *SampleRTTi  Calculate deviation wrt moving average DevRTTi = (1-)*DevRTTi-1 + *|SampleRTTi-EstimatedRTTi-1|  Timeout = EstimatedRTT + 4*DevRTT TCP Throughput TCP throughput: A very very simple model  What’s the average throughout of TCP as a function of window size and RTT T ?  Ignore slow start  Let W be the window size when loss occurs.  When window is W, throughput is W/T  Just after loss, window drops to W/2, throughput to W/2T  Average throughput: 3W/4T TCP throughput: A very simple model  But what is W when loss occurs? C = link capacity in packets/sec Q = queue capacity  in number of packets  When window is w and queue has q packets, TCP is sending at rate w/(T+q/C)  For maintaining utilization and steady state  Just before loss, rate = W/(T+Q/C) = C  Just after loss, rate = W/2T = C  For Q = CT (a common thumbrule to set router buffer sizes), a loss occurs every ¼ (3/4W)Q = 3W2/8 packets Deriving TCP throughput/loss relationship # packets sent per “period” = W /2 W W W     1  ...  W   (  n) 2 2  n 0 2 W TCP window size W  W W /2    1   n 2  2 n 0 W/2 period time (rtt) W  W W / 2(W / 2  1)    1  2 2  2 3 3  W2  W 8 4 3  W2 8 Deriving TCP throughput/loss relationship 3 2 # packets sent per “period”  W 8 W 1 packet lost per “period” implies: TCP window size ploss  8 8 or: W  3W 2 3 ploss W/2 period time (rtt) 3 packets B  avg._thrup ut  W 4 rtt 1.22 packets B  avg._thrup ut  ploss rtt Alternate fluid model  Rate of change of sending rate = term inversely proportional to current rate with probability (1-p) - term proportional to current rate with probability p  In steady state, TCP throughput: A better loss rate based “simple” model [PFTK]  With many flows, loss rate and delay are not affected much by a single TCP flow  TCP behavior completely specified by loss and delay pattern along path (bounded by bottleneck capacity)  Given loss rate p and delay T what is TCP’s throughput B packets/sec taking timeouts into account? What is PFTK modeling?  Independent loss probability p across rounds  Loss ´ triple duplicate acks  Bursty loss in a round: if some packet lost, all following packets in that round also lost  Timeout if < three duplicate acks received PFTK empirical validation: Low loss PFTK empirical validation: High loss Loss-based TCP  Evolution of loss-based TCP  Tahoe (without fast retransmit)  Reno (triple duplicate acks + fast retransmit)  NewReno (Reno + handling multiple losses better)  SACK (selective acknowledgment) common today  Q: what if loss not due to congestion? Delay-based TCP Vegas  Uses delay as a signal of congestion  Idea: try to keep a small constant number of packets at bottleneck queue  Expected = W/BaseRTT  Actual = W/CurRTT  Diff = Expected - Actual  Try to keep Diff between fixed 1 and 3  More recent FAST TCP based on Vegas.  Delay-based TCP not widely used today TCP-Friendliness  Can we try MyFavNew TCP?  Well, is it TCP-friendly?  Any alternative congestion control scheme needs to coexist with TCP in FIFO queues in the besteffort Internet, or be isolated from TCP.  To co-exist with TCP, it must impose the same long-term load on the network: No greater long-term throughput as a function of packet loss and delay so TCP doesn't suffer  Not significantly less long-term throughput or it's not too useful  TCP friendly rate control (TFRC) Use a model of TCP's throughout as a function of the loss rate and RTT directly in a congestion control algorithm.  If transmission rate is higher than that given by the model, reduce the transmission rate to the model's rate.  Otherwise increase the transmission rate.  Eg, DCCP (Datagram Congestion Control Protocol), for unreliable congestion control Q: how to measure/use loss rate and RTT? High speed TCP TCP in high speed networks  Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput  Requires window size W = 83,333 in-flight segments  Throughput in terms of loss rate:  ➜ p = 2·10-10 or equivalently at most one drop every couple hours!  New versions of TCP for high-speed networks needed! TCP’s long recovery delay  More than an hour to recover from a loss or timeout ~41,000 packets ~60,000 RTTs ~100 minutes High-speed TCP  Proposals  Scalable TCP, HSTCP, FAST, CUBIC  General idea is to use superlinear window increase  Particularly useful in high bandwidth-delay product regimes Alternate choices of response functions Scalable TCP - S = 0.15/p Q: Whatever happened to TCP-friendly? High speed TCP [Floyd]  additive increase, multiplicative decrease  increments, decrements depend on window size Scalable TCP (STCP) [T. Kelly]  multiplicative increase, multiplicative decrease WW+a WW–bW per ACK per window with loss STCP dynamics From 1st PFLDnet Workshop, Tom Kelly Active Queue Management Router Queue Management  normally, packets dropped only when queue overflows  “drop-tail” queueing P6 P5 P4 P3 P2 P1 ISP router Internet ISP router FCFS Scheduler The case against drop-tail queue management P6 P5 P4 P3 P2 P1 FCFS Scheduler  Large queues in routers are “a bad thing”  Delay: end-to-end latency dominated by length of queues at switches in network  Allowing queues to overflow is “a bad thing” Fairness: connections transmitting at high rates can starve connections transmitting at low rates  Utilization: connections can synchronize their response to congestion  Idea: early random packet drop P6 P5 P4 P3 P2 P1 FCFS Scheduler When queue length exceeds threshold, drop packets with queue length dependent probability probabilistic packet drop: flows see same loss rate  problem: bursty traffic (burst arrives when queue is near threshold) can be over penalized  Random early detection (RED) packet drop Average queue length Max queue length Drop probability Forced drop Max threshold Probabilistic early drop Min threshold No drop Time  Use exponential average of queue length to determine when to drop  avoid overly penalizing short-term bursts  react to longer term trends  Tie drop prob. to weighted avg. queue length  avoids over-reaction to mild overload conditions Random early detection (RED) packet drop Average queue length Max queue length Drop probability Forced drop Max threshold Probabilistic early drop Min threshold No drop Time Drop probability 100% maxp min max Weighted Average Queue Length RED summary: why random drop?  Provide gentle transition from no-drop to all-drop  Provide “gentle” early warning  Avoid synchronized loss bursts among sources  Provide same loss rate to all sessions:  With tail-drop, low-sending-rate sessions can be completely starved Random early detection (RED) today  Many (5) parameters: nontrivial to tune (at least for HTTP traffic)  Gains over drop-tail FCFS not that significant  Still not widely deployed … Why randomization important?  Synchronization of periodic routing updates  Periodic losses observed in end-end Internet traffic source: Floyd, Jacobson 1994 Router update operation: time spent in state depends on msgs received from others (weak coupling between routers processing) timeout, or link fail update prepare own routing update receive update from neighbor process (time: TC2) (time: TC) <ready> send update (time: Td to arrive at dest) start_timer (uniform: Tp +/- Tr) wait receive update from neighbor process Router synchronization  20 (simulated) routers broadcasting updates to each other  x-axis: time until routing update sent relative to start of round  By t=100,000 all router rounds are of length 120!  synchronization or lack thereof depends on system parameters Avoiding synchronization  Choose random timer component, Tr large (e.g., several multiples of TC) Add enough randomization to avoid synchronization receive update from neighbor process (time: TC2) prepare own routing update (time: TC) <ready> send update (time: Td to arrive) start_timer (uniform: Tp +/- Tr) wait receive update from neighbor process Randomization  Takeaway message:  randomization robust makes a system simple and Background transport: TCP Nice What are background transfers?  Data that humans are not waiting for  Non-deadline-critical  Unlimited demand  Examples Prefetched traffic on the Web  File system backup  Large-scale data distribution services  Background software updates  Media file sharing  Desired Properties  Utilization of spare network capacity  No interference with regular transfers  Self-interference • applications hurt their own performance  Cross-interference • applications hurt other applications’ performance TCP Nice  Goal: abstraction of free infinite bandwidth  Applications say what they want  OS manages resources and scheduling  Self tuning transport layer Reduces risk of interference with foreground traffic  Significant utilization of spare capacity by background traffic  Simplifies application design  Why change TCP?  TCP does network resource management  Need flow prioritization  Alternative: router prioritization + More responsive, simple one bit priority  Hard to deploy  Question:  Can end-to-end congestion control achieve noninterference and utilization? TCP Nice  Proactively detects congestion  Uses increasing RTT as congestion signal  Congestion  incr. queue lengths  incr. RTT  Aggressive responsiveness to congestion  Only modifies sender-side congestion control Receiver and network unchanged  TCP friendly  TCP Nice  Basic algorithm 1. Early Detection  thresh. queue length incr. in RTT  2. Multiplicative decrease on early congestion  3. Allow cwnd < 1.0 (despite no loss)   per-ack operation:  if(curRTT > minRTT + threshold*(maxRTT – minRTT)) numCong++;  per-round operation:  if(numCong > f.W) W  W/2 else { … AIMD congestion control } Nice: the works Reno Add * Add * Nice Add * Add * Mul + Mul + Mul + t.B m pkts B minRTT = t maxRTT = tB/m  Non-interference getting out of the way in time  Utilization maintaining a small queue Foreground Document Latency (sec) Network Conditions 1e3 100 10 V0 Reno Nice 1 Vegas Router Prio 0.1 1 10 Spare Capacity 100  Nice causes low interference to foreground Web traffic even when there isn’t much spare capacity. Scalability 1e3 Document Latency (sec) Vegas 100 Reno 10 V0 Nice 1 Router Prio 0.1 1 10 Num BG flows  W < 1 allows Nice to scale to any number of background flows 100 Utilization 8e4 Vegas BG Throughput (KB) Reno 6e4 Router Prio 4e4 V0 Nice 2e4 0 1 10 Num BG flows 100  Nice utilizes 50-80% of spare capacity w/o stealing any bandwidth from FG Wide-area network experiments What is TCP optimizing? How does TCP allocate network resources?  Problem: Given a network and some number of long-lived TCP connections between different source-destination routes, can we model the resulting resource allocation?  How to model the interaction between TCP and the network?  Recall: PFTK like models assumed network conditions are not affected by (a single) TCP flow Optimization-based approach towards congestion control Resource allocation as optimization problem:  How to allocate resources (e.g., bandwidth) to optimize some objective function  Maybe not possible to obtain exact optimality but..  optimization framework as means to explicitly steer network towards desirable operating point  practical congestion control as distributed asynchronous implementations of optimization algorithm  systematic approach towards protocol design Model  Network: Links l each of capacity cl  Sources s: (L(s), Us(xs)) L(s) - links used by source s Us(xs) - utility if source rate = xs example utility function for elastic application Us(xs) xs x1 x1  x2  c1 x1  x3  c2 c1 x2 c2 x3 Q: What are possible allocations with say unit capacity links? Optimization Problem max xs  0 U subject to s s ( xs ) x sS ( l ) s  cl , l  L “system” problem  maximize system utility (note: all sources “equal”)  constraint: bandwidth used less than capacity  centralized solution to optimization impractical  must know all utility functions  impractical for large number of sources  can we view congestion control as distributed asynchronous algorithms to solve this problem? The user view  User can choose amount to pay per unit time, ws  Would like allocated bandwidth, xs in proportion to ws ws xs  ps  ps could be viewed as charge per unit flow for user s w s  U s  w s max  ps  subject to w s  0 user’s utility  user problem cost The network view  Suppose network knows vector {ws}, chosen by users  Network wants to maximize logarithmic utility function max xs  0 w subject to s s log xs x sS( l) network problem s  cl Solution existence  There exist prices, ps, source rates, xs, and amount-to-pay-per-unittime, ws = psxs such that  {Ws} solves user problem  {Xs} solves the network problem  {Xs} is the unique solution to the system problem  ws    ws Us  max  ps  subject to w s  0 w max s xs  0 s subject to log xs x sS(l) max xs  0 U subject to s s s  cl ( xs ) x sS ( l ) s  cl , l  L Proportional Fairness  Vector of rates, {xs}, proportionally fair if feasible and for any other feasible vector {xs*}: x  xs 0  xs sS * s  Result: if wr=1, then {Xs} solves the network problem IFF it is proportionally fair  Similar result exists for the case that wr not equal 1. Max-min Fairness Rates {xr} max-min fair if for any other feasible rates {yr}, if ys > xs, then  p, such that xp xs and yp < xp Minimum potential delay fairness  Rates {xr} are minimum potential delay fair if Ur (xr) = -wr/xr Interpretation: if wr is file size, then wr/xr is transfer time; optimization problem is to minimize sum of transfer delays Max-min Fairness rates {xr} max-min fair if for any other feasible rates {yr}, if ys > xs, then  p, such that xp xs and yp < xp What is corresponding utility function? 1 r x U r ( xr )  lim   1 Solving the network problem  Results so far: existence - solution exists with given properties  How to compute solution?  Ideally: distributed solution easily embodied in protocol  Should reveal insight into existing protocol Solving the network problem   d xs (t )  k  ws  xs (t )  pl (t )  dt lL ( s )   multiplicative change in linear bandwidth increase decrease allocation at s where    pl (t )  gl   xs (t )   lL ( s )  congestion “signal”: function of aggregate rate at link l, fed back to s. Solving the network problem   d xs (t )  k  ws  xs (t )  pl (t )  dt lL ( s )    Results:  * converges to solution of relaxation of network problem  xs(t)Spl(t) converges to ws  Interpretation: TCP-like algorithm to iteratively solves optimal rate allocation! Source Algorithm  Source needs only its path price: xÝr  kr (xr )(Ur '(xr )  qr )  kr() nonnegative nondecreasing function  Above algorithm converges to unique solution for any initial condition  qr interpreted as loss/marking probability Proportionally-Fair Controller If utility function is then a controller that implements it is given by Pricing interpretation  Can network choose pricing scheme to achieve fair resource allocation?  Suppose network charges price qr ($/bit) where qr= pl  User’s strategy: spend wr ($/sec.) to maximize Optimal User Strategy  equivalently, Simplified TCP-Reno  suppose x  then, 2(1  p) T p 2  T p 1 U ( x)   Tx  interpretation: minimize (weighted) delay Is AIMD special?  Consider a window control as follows  cwnd += a*cwnd^n, when no loss  cwnd -= b*cwnd^m when loss  where n<m  Expected change in congestion window  Expected change in rate per unit time MIMD (n,m)  Consider the controller  where  Then, at equilibrium  Where α = m-n. For stability Motivation Congestion Control: maximize user utility Given routing Rli how to adapt end rate xi? Traffic Engineering: minimize network congestion Given traffic xi how to perform routing Rli? Congestion Control Model Users are indexed by i aggregate utility Utility Ui(xi) max. ∑ i Ui(xi) s.t. ∑i Rlixi ≤ cl var. x capacity constraints Congestion control provides fair rate allocation amongst users Source rate xi Traffic Engineering Model Links are indexed by l Cost f(ul) aggregate cost ul = 1 Link Utilization ul min. ∑l f(ul) s.t. ul =∑i Rlixi/cl var. R Traffic engineering avoids bottlenecks in the network Model of Internet Reality Congestion Control: max ∑i Ui(xi), s.t. ∑i Rlixi ≤ cl xi Rli Traffic Engineering: min ∑l f(ul), s.t. ul =∑i Rlixi/cl System Properties  Convergence  Does it achieve some objective?  Benchmark: max. ∑i Ui(xi) s.t. Rx ≤ c Var. x, R  Utility gap between the joint system and benchmark Multipath TCP Joint routing and congestion control  Multipath TCP controller

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document