Download Slides

Chapter 3 outline  Transport-layer services  Principles of reliable data transfer  Connectionless transport: UDP (self –study assignment)  3.5 Connection-oriented transport: TCP  3.6 Principles of congestion control  3.7 TCP congestion control  Summary Ch3-2 -1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581  Pipelined with send &  pt-to-pt (i.e., no multicast:  receive buffers one sender, one receiver  full duplex data:   bi-directional data flow in same connection MSS: maximum segment size socket door application writes data application reads data TCP send buffer TCP receive buffer socket door segment  connection-oriented:  handshaking (exchange of control msgs) init’s sender & receiver states before data exchange  reliable, in-order byte steam:  no “message boundaries”  flow controlled:  sender will not overwhelm receiver  Congestion control  Sender sets its window size for congestion control (and flow control) Ch3-2 -2 TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) represented by bytes of data # bytes rcvr willing to accept application data (variable length) Ch3-2 -3 TCP seq. #’s and ACKs Seq. #’s:  byte stream “number” of first byte in segment’s data ACKs:  seq # of next byte expected from other side  cumulative ACK Q: how receiver handles out-of-order segments  A: TCP spec doesn’t say, - up to implementor Host A User types ‘C’ Host B host ACKs receipt of ‘C’, echoes back ‘C’ host ACKs receipt of echoed ‘C’ simple telnet scenario Ch3-2 time -4 TCP: retransmission scenarios Host A X loss Expected (or NextByteto Send = 100 Expected (or NextByteto Send = 120 Expected (or NextByteto Send = 100 time Expected (or NextByteto Send = 120 lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A time premature timeout (Accumulative ACK) Ch3-2 -5 TCP Round Trip Time and Timeout Q: how to set TCP timeout value?  longer than RTT  but RTT varies  too short: premature timeout  unnecessary retransmissions  too long: slow reaction to segment loss Q: how to estimate RTT?  SampleRTT: measured time from segment transmission until ACK receipt  ignore retransmissions  SampleRTT will vary, want estimated RTT “smoother”  average several recent measurements, not just current SampleRTT Ch3-2 -6 TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT  Exponential weighted moving average  influence of past sample decreases exponentially fast  typical value:  = 0.125 Ch3-2 -7 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT Ch3-2 -8 TCP Round Trip Time and Timeout Setting the timeout  EstimtedRTT plus “safety margin”  large variation in EstimatedRTT -> larger safety margin  first estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically,  = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT Ch3-2 -9 Fast Retransmit  Time-out period often relatively long:  long delay before resending lost packet  Detect lost segments via duplicate ACKs.   Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs.  If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:  fast retransmit: resend segment before timer expires Ch3-2 -10 Host A seq # x1 seq # x2 seq # x3 seq # x4 seq # x5 Host B X ACK x1 ACK x1 ACK x1 ACK x1 timeout triple duplicate ACKs time Transport Layer 3-11 Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Ch3-2 -12 TCP Flow Control  receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast  speed-matching  app process may be service: matching the send rate to the receiving app’s drain rate slow at reading from buffer Ch3-2 -13 TCP Flow control: how it works  Rcvr advertises spare room by including value of RcvWindow in segments  Sender limits unACKed data to RcvWindow (Suppose TCP receiver discards out-of-order segments)  spare room in buffer = RcvWindow ( = rwnd) = RcvBuffer-[LastByteRcvd LastByteRead]  guarantees receive buffer doesn’t overflow  The number of unACKed data will be the smaller of RcvWindow and Congestion Window (to be discussed later) Ch3-2 -14 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments  initialize TCP variables:  seq. #s  buffers, flow control info (e.g. RcvWindow)  client: connection initiator Socket clientSocket = new Socket("hostname","port number");  server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1: client host sends TCP SYN segment to server  specifies initial seq #  no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers  specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data  Ch3-2 -15 TCP Connection Establishment (3-way) Establishing a connection: Step 1: client sends TCP SYN client server 1 control segment to server Step 2: server receives SYN, 2 Step 3: clients receives SYN+ACK, replies with ACK and possible data established replies with SYN and ACK (in one segment) Transport Layer 3-16 TCP Connection Close client closes socket: clientSocket.close(); client Step 1: client sends FIN Step 2: server receives FIN, server closing replies with ACK. Sends FIN. Waiting to close closing Step 3: client receives FIN,  Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. timed wait replies with ACK. closed closed Connection closed. Ch3-2 -17 TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle Ch3-2 -18 Difficulty with Symmetric Release  Two-army problem: when Blue army #1 can be sure that Blue army #2 will attack at the same time? (never) Ch3-2 -19 Disconnection Request (DR) = Attack •3-way handshake usually works •hosts 1 needs to retransmit DR several times •no perfect solution, e.g., with a half-open connection in case (d) if the initial DR and all subsequent DRs are lost! Ch3-2 -20 Principles of Congestion Control Congestion:  informally: “too many sources sending too much data too fast for network to handle”  different from flow control!  manifestations:  lost packets (buffer overflow at routers)  long delays (queueing in router buffers)  a top-10 problem! Ch3-2 -21 Causes/costs of congestion: scenario 1  two senders, two receivers  one router, infinite buffers  no retransmission Host A Host B lin : original data lout •Link capacity C •unlimited shared output link buffers  large delays when congested  maximum achievable throughput Ch3-2 -22 Causes/costs of congestion: scenario 2  one router, finite buffers  sender retransmission of “lost” packet upon timeout Host A l'in : original data, plus retransmitted data lout lin : original data Host B finite shared output link buffers Ch3-2 -23 Causes/costs of congestion: scenario 2  Early timeout: every packet retransmitted once (fig a)  “Perfect” retransmission : 1 retarns for 2 pkts (fig b) “costs” of congestion:  more work (retrans) for given “goodput”  unneeded retransmissions: link carries multiple copies of pkt Ch3-2 -24 Causes/costs of congestion: scenario 3  four senders Q: what happens as l in and l increase ?  multihop paths  timeout/retransmit in Host A l'in : original data, plus retransmitted data lout lin : original data finite shared output link buffers Host B Ch3-2 -25 Causes/costs of congestion: scenario 3 H o s t A l o u t H o s t B Another “cost” of congestion:  when packet dropped, any “upstream transmission capacity used for that packet was wasted! Ch3-2 -26 Approaches towards congestion control  Closed-Loop: admission control prevents congestion  Open-Loop: monitors and deals with congestion Two broad open-loop approaches: End-end congestion control:  no explicit feedback from network  congestion inferred from end-system (sender) observed loss, delay (e.g., TCP timeout) Network-assisted congestion control:  routers provide feedback to end systems   single bit indicating congestion in ATM and TCP/IP (new ) explicit rate sender should send at Ch3-2 -27 Network Assisted Congestion Control •Network feedback via receiver (in ATM and TCP/IP’s ECN bit) •Direct network (router) feedback: also called choke packets Ch3-2 -28 TCP Congestion Control (Tahoe and Reno)  end-end control (no network How does sender assistance) perceive congestion?  sender limits transmission:  loss event = timeout or LastByteSent-LastByteAcked 3 duplicate acks  min {CongWin, RcvWindow}  TCP sender reduces  Roughly, rate (CongWin) after loss event CongWin rate = Bytes/sec RTT Main mechanisms:  CongWin is dynamic function of perceived network congestion    AIMD (cong avoidance and fast recovery) slow start Fast retransmit Ch3-2 -29 TCP AIMD additive increase: increase multiplicative decrease: CongWin by 1 MSS every cut CongWin in half RTT in the absence of after a 3 Dup ACK in loss: congestion avoidance TCP Reno congestion window (not in TCP Tahoe which 24 Kbytes uses Slow Start) 16 Kbytes 8 Kbytes time Long-lived TCP connection Ch3-2 -30 TCP Slow Start  When connection begins, CongWin = 1 MSS   Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps  available bandwidth may be >> MSS/RTT  desirable to quickly ramp up to respectable rate  When connection begins, increase (ramp-up) rate exponentially fast until first loss event indicated by a triple dup (TD) ACK or time out (TO)  Slow at start but grows fast ! Ch3-2 -31 TCP Slow Start (more)  When connection begins,   double CongWin every RTT done by incrementing CongWin by 1 for every MSS Acked Host B RTT increase rate exponentially until first loss event: Host A  TCP Tahoe (earliest version): slow start after either a TO or TD loss  I.e., Fast retransmit too but no Fast recovery time Ch3-2 -32 Fast Retransmit (Reno)  After a TD loss: cut in 1/2  window then grows linearly (congestion avoidance)  But after a TO loss:  CongWin set to 1 MSS (slow start)  window then grows exponentially  to a new threshold, then grows linearly (as in TD)  CongWin congestion window size (segments) 14 12 TD LOSS 10 TCP Reno 8 6 4 TCP Tahoe 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round Philosophy: Series1 Series2 • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming” Ch3-2 -33 TCP Sender Congestion Control (see Table 3.3 for TCP Reno)  When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.  When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.  When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.  When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. Ch3-2 -34 TCP mechanisms Illustarted Ch3-2 -35 TCP congestion control FSM: details duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s),as allowed . congestion avoidance duplicate ACK dupACKcount++ New ACK cwnd = ssthresh dupACKcount = 0 dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed Transport Layer 3-36 Other TCP Variants  Inefficiency in high speed networks:  it takes long time for sender to recovery to original cwnd after it being halved due to a packet loss.  HSTCP, TCP-Westwood, FAST, Quick-Start, Explicit Transport Error Notification, eXplicit Control Protocol (XCP)…  Inefficiency in wireless networks:  current TCP implementations are unable to distinguish buffer overflow loss (congestion in wired networks) and random loss (in wireless networks)  Inefficiency in satellite network:  long propagation delay and large RTT indicate low throughput.  TCP variants for wireless/satellite networks:  TCP-Peach, Indirect-TCP (Split TCP), SNOOP, Explicit Loss Notification (ELN).. Ch3-2 -37 TCP throughput  What’s the average throughout ot TCP as a function of window size and RTT?  Ignore slow start  Let W be the window size when loss occurs.  When window is W, throughput is W/RTT  Just after loss, window drops to W/2, throughput to W/2RTT.  Average throughout (roughly) .75 W/RTT Ch3-2 -38 TCP Futures  Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput  Requires window size W = 83,333 in-flight segments even if no loss  Throughput in terms of loss rate:  ➜ L = 2·10-10 Wow  New versions of TCP for high-speed needed! Ch3-2 -39 High-Speed TCP (HSTCP)  Like standard TCP when cwnd is small  More aggressive than standard TCP when cwnd is large Increase window: cwnd = cwnd + a(cwnd) /cwnd for every segment ACKed  Decrease window: cwnd = (1 – b(cwnd) ) cwnd for every loss  For standard TCP, a(cwnd)=1 and b(cwnd)=0.5.  HSTCP: e.g., cwnd=83000, b(83000)=0.1means decreasing 10% after a congestion event, a(83000)=72 means an increase of 72 segments upon receiving one ACK.  Ch3-2 -40 HSTCP Ch3-2 -41 TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K, regardless the initial window size. TCP connection 1 TCP connection 2 bottleneck router capacity R Ch3-2 -42 Why is TCP fair? Two competing sessions:  Additive increase in throughput from (x,y) with slope of 1  multiplicative decrease in throughput to ({x+i}/2, {y+i}/2) equal bandwidth share R x+i, y+i (x+i)/2, (y+i)/2 x,y Connection 1 throughput R Ch3-2 -43 Fairness (more) Fairness and UDP  Multimedia apps often do not use TCP  do not want rate throttled by congestion control  Instead use UDP:  pump audio/video at constant rate, tolerate packet loss  Research area: make themTCP friendly Fairness and parallel TCP connections  Can open more than one parallel connections between 2 hosts (NetAnts).  Web browsers do this  Example: link of rate R supporting 9 connections;   new app asks for 1 TCP, gets rate R/10 new app asks for 9 TCPs, gets R/2 ! Ch3-2 -44 Delay modeling Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by:  TCP connection establishment  data transmission delay  slow start Notation, assumptions:  Assume one link between client and server of rate R  S: MSS (bits)  O: object/file size (bits)  no retransmissions (no loss, no corruption) Window size:  First assume: fixed congestion window, W segments  Then dynamic window, modeling slow start Ch3-2 -45 Fixed congestion window (1) First case: WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R Ch3-2 -46 Fixed congestion window (2) Second case:  WS/R < RTT + S/R: wait for ACK after sending window’s worth of data sent  “gap” between two “rounds” is S/R+RTT – WS/R  Let K = O / WS be the number of rounds  There are K-1 gaps delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] Ch3-2 -47 TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start And P is the number of rounds TCP idles at server (i.e, # of gaps): Will show that the delay for one object is: P O delay   2 RTT   idleTime p R p 1 P O S S   2 RTT   [  RTT  2 p 1 ] R R p 1 R  This is the window size during the p-th round in Slow-Start O S S  2 RTT  P[ RTT  ]  (2 P  1) R R R 1)Let K be the number of sending rounds/windows for the object. (calculate K?) 2)If the object data is large , then when the max sending window is large enough, eventually the idleTime will be 0 (after slow-start ends, as assumed earlier) 3)Let Q be the last round during which idleTime is still > 0 then Ch3-2 -48 TCP Delay Modeling: Slow Start (2) Delay components: • 2 RTT for connection estab and request • O/R to transmit object • time server idles due to slow start initiate TCP connection request object first window = S/R RTT Server idles: P = min{K-1,Q} times second window = 2S/R third window = 4S/R Example: • O/S = 15 segments • K = 4 windows •Q=2 • P = min{K-1,Q} = 2 Server idles P=2 times with a decreasing amt of idleTime fourth window = 8S/R complete transmission object delivered time at client time at server Ch3-2 -49 TCP Delay Modeling (3) 2 p 1 S  time to transmit the pth window R initiate TCP connection request object first window = S/R  RTT S p 1 S   R  RTT  2 R   idle time after the pth window second window = 2S/R third window = 4S/R fourth window = 8S/R complete transmission object delivered time at client time at server Ch3-2 -50 TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K ? Calculation of Q, number of idles for infinite-size object, is similar: largest Q s.t.: Q 1 S S/R+ RTT >= 2 R Ch3-2 -51 Effect of Slow Start  Delay is affected by  File/object size O  Transmission rate R  Fixed window size or MSS  idle time  RTT  Extra delay due to slow start  With a large O, and small R and RTT  Slow start does not hurt much  With a small O and large RxRTT  Slow start hurts significantly (percentage wise) Ch3-2 -52 Food For Thought  Assume Web page consists of: 1 base HTML page (of size O bits)  M images (each of size O bits)  Non-persistent HTTP:  M+1 TCP connections in series  Response time = (M+1)O/R + (M+1)2RTT + sum of idle times  What about Persistent HTTP?  What about Non-persistent HTTP with X parallel connections? (is this X times fast than having 1 nonpersistent HTTP?)  Ch3-2 -53 HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 12 10 8 6 4 2 0 non-persistent persistent parallel nonpersistent 28 100 1 10 Kbps Kbps Mbps Mbps For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections. Ch3-2 -54 HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 non-persistent 40 persistent 30 20 parallel nonpersistent 10 0 28 100 1 10 Kbps Kbps Mbps Mbps For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delaybandwidth networks. Ch3-2 -55 Chapter 3: Summary  principles behind transport layer services:  reliable data transfer  flow control  congestion control  instantiation and implementation in the Internet  UDP  TCP Next:  leaving the network “edge” (application, transport layers)  into the network “core” Ch3-2 -56

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slides