Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 outline Transport-layer services Principles of reliable data transfer Connectionless transport: UDP (self –study assignment) 3.5 Connection-oriented transport: TCP 3.6 Principles of congestion control 3.7 TCP congestion control Summary Ch3-2 -1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 Pipelined with send & pt-to-pt (i.e., no multicast: receive buffers one sender, one receiver full duplex data: bi-directional data flow in same connection MSS: maximum segment size socket door application writes data application reads data TCP send buffer TCP receive buffer socket door segment connection-oriented: handshaking (exchange of control msgs) init’s sender & receiver states before data exchange reliable, in-order byte steam: no “message boundaries” flow controlled: sender will not overwhelm receiver Congestion control Sender sets its window size for congestion control (and flow control) Ch3-2 -2 TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) represented by bytes of data # bytes rcvr willing to accept application data (variable length) Ch3-2 -3 TCP seq. #’s and ACKs Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementor Host A User types ‘C’ Host B host ACKs receipt of ‘C’, echoes back ‘C’ host ACKs receipt of echoed ‘C’ simple telnet scenario Ch3-2 time -4 TCP: retransmission scenarios Host A X loss Expected (or NextByteto Send = 100 Expected (or NextByteto Send = 120 Expected (or NextByteto Send = 100 time Expected (or NextByteto Send = 120 lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A time premature timeout (Accumulative ACK) Ch3-2 -5 TCP Round Trip Time and Timeout Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT Ch3-2 -6 TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value: = 0.125 Ch3-2 -7 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT Ch3-2 -8 TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT Ch3-2 -9 Fast Retransmit Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend segment before timer expires Ch3-2 -10 Host A seq # x1 seq # x2 seq # x3 seq # x4 seq # x5 Host B X ACK x1 ACK x1 ACK x1 ACK x1 timeout triple duplicate ACKs time Transport Layer 3-11 Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Ch3-2 -12 TCP Flow Control receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast speed-matching app process may be service: matching the send rate to the receiving app’s drain rate slow at reading from buffer Ch3-2 -13 TCP Flow control: how it works Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow ( = rwnd) = RcvBuffer-[LastByteRcvd LastByteRead] guarantees receive buffer doesn’t overflow The number of unACKed data will be the smaller of RcvWindow and Congestion Window (to be discussed later) Ch3-2 -14 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1: client host sends TCP SYN segment to server specifies initial seq # no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data Ch3-2 -15 TCP Connection Establishment (3-way) Establishing a connection: Step 1: client sends TCP SYN client server 1 control segment to server Step 2: server receives SYN, 2 Step 3: clients receives SYN+ACK, replies with ACK and possible data established replies with SYN and ACK (in one segment) Transport Layer 3-16 TCP Connection Close client closes socket: clientSocket.close(); client Step 1: client sends FIN Step 2: server receives FIN, server closing replies with ACK. Sends FIN. Waiting to close closing Step 3: client receives FIN, Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. timed wait replies with ACK. closed closed Connection closed. Ch3-2 -17 TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle Ch3-2 -18 Difficulty with Symmetric Release Two-army problem: when Blue army #1 can be sure that Blue army #2 will attack at the same time? (never) Ch3-2 -19 Disconnection Request (DR) = Attack •3-way handshake usually works •hosts 1 needs to retransmit DR several times •no perfect solution, e.g., with a half-open connection in case (d) if the initial DR and all subsequent DRs are lost! Ch3-2 -20 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Ch3-2 -21 Causes/costs of congestion: scenario 1 two senders, two receivers one router, infinite buffers no retransmission Host A Host B lin : original data lout •Link capacity C •unlimited shared output link buffers large delays when congested maximum achievable throughput Ch3-2 -22 Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of “lost” packet upon timeout Host A l'in : original data, plus retransmitted data lout lin : original data Host B finite shared output link buffers Ch3-2 -23 Causes/costs of congestion: scenario 2 Early timeout: every packet retransmitted once (fig a) “Perfect” retransmission : 1 retarns for 2 pkts (fig b) “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt Ch3-2 -24 Causes/costs of congestion: scenario 3 four senders Q: what happens as l in and l increase ? multihop paths timeout/retransmit in Host A l'in : original data, plus retransmitted data lout lin : original data finite shared output link buffers Host B Ch3-2 -25 Causes/costs of congestion: scenario 3 H o s t A l o u t H o s t B Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted! Ch3-2 -26 Approaches towards congestion control Closed-Loop: admission control prevents congestion Open-Loop: monitors and deals with congestion Two broad open-loop approaches: End-end congestion control: no explicit feedback from network congestion inferred from end-system (sender) observed loss, delay (e.g., TCP timeout) Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion in ATM and TCP/IP (new ) explicit rate sender should send at Ch3-2 -27 Network Assisted Congestion Control •Network feedback via receiver (in ATM and TCP/IP’s ECN bit) •Direct network (router) feedback: also called choke packets Ch3-2 -28 TCP Congestion Control (Tahoe and Reno) end-end control (no network How does sender assistance) perceive congestion? sender limits transmission: loss event = timeout or LastByteSent-LastByteAcked 3 duplicate acks min {CongWin, RcvWindow} TCP sender reduces Roughly, rate (CongWin) after loss event CongWin rate = Bytes/sec RTT Main mechanisms: CongWin is dynamic function of perceived network congestion AIMD (cong avoidance and fast recovery) slow start Fast retransmit Ch3-2 -29 TCP AIMD additive increase: increase multiplicative decrease: CongWin by 1 MSS every cut CongWin in half RTT in the absence of after a 3 Dup ACK in loss: congestion avoidance TCP Reno congestion window (not in TCP Tahoe which 24 Kbytes uses Slow Start) 16 Kbytes 8 Kbytes time Long-lived TCP connection Ch3-2 -30 TCP Slow Start When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate When connection begins, increase (ramp-up) rate exponentially fast until first loss event indicated by a triple dup (TD) ACK or time out (TO) Slow at start but grows fast ! Ch3-2 -31 TCP Slow Start (more) When connection begins, double CongWin every RTT done by incrementing CongWin by 1 for every MSS Acked Host B RTT increase rate exponentially until first loss event: Host A TCP Tahoe (earliest version): slow start after either a TO or TD loss I.e., Fast retransmit too but no Fast recovery time Ch3-2 -32 Fast Retransmit (Reno) After a TD loss: cut in 1/2 window then grows linearly (congestion avoidance) But after a TO loss: CongWin set to 1 MSS (slow start) window then grows exponentially to a new threshold, then grows linearly (as in TD) CongWin congestion window size (segments) 14 12 TD LOSS 10 TCP Reno 8 6 4 TCP Tahoe 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round Philosophy: Series1 Series2 • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming” Ch3-2 -33 TCP Sender Congestion Control (see Table 3.3 for TCP Reno) When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. Ch3-2 -34 TCP mechanisms Illustarted Ch3-2 -35 TCP congestion control FSM: details duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s),as allowed . congestion avoidance duplicate ACK dupACKcount++ New ACK cwnd = ssthresh dupACKcount = 0 dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed Transport Layer 3-36 Other TCP Variants Inefficiency in high speed networks: it takes long time for sender to recovery to original cwnd after it being halved due to a packet loss. HSTCP, TCP-Westwood, FAST, Quick-Start, Explicit Transport Error Notification, eXplicit Control Protocol (XCP)… Inefficiency in wireless networks: current TCP implementations are unable to distinguish buffer overflow loss (congestion in wired networks) and random loss (in wireless networks) Inefficiency in satellite network: long propagation delay and large RTT indicate low throughput. TCP variants for wireless/satellite networks: TCP-Peach, Indirect-TCP (Split TCP), SNOOP, Explicit Loss Notification (ELN).. Ch3-2 -37 TCP throughput What’s the average throughout ot TCP as a function of window size and RTT? Ignore slow start Let W be the window size when loss occurs. When window is W, throughput is W/RTT Just after loss, window drops to W/2, throughput to W/2RTT. Average throughout (roughly) .75 W/RTT Ch3-2 -38 TCP Futures Example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput Requires window size W = 83,333 in-flight segments even if no loss Throughput in terms of loss rate: ➜ L = 2·10-10 Wow New versions of TCP for high-speed needed! Ch3-2 -39 High-Speed TCP (HSTCP) Like standard TCP when cwnd is small More aggressive than standard TCP when cwnd is large Increase window: cwnd = cwnd + a(cwnd) /cwnd for every segment ACKed Decrease window: cwnd = (1 – b(cwnd) ) cwnd for every loss For standard TCP, a(cwnd)=1 and b(cwnd)=0.5. HSTCP: e.g., cwnd=83000, b(83000)=0.1means decreasing 10% after a congestion event, a(83000)=72 means an increase of 72 segments upon receiving one ACK. Ch3-2 -40 HSTCP Ch3-2 -41 TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K, regardless the initial window size. TCP connection 1 TCP connection 2 bottleneck router capacity R Ch3-2 -42 Why is TCP fair? Two competing sessions: Additive increase in throughput from (x,y) with slope of 1 multiplicative decrease in throughput to ({x+i}/2, {y+i}/2) equal bandwidth share R x+i, y+i (x+i)/2, (y+i)/2 x,y Connection 1 throughput R Ch3-2 -43 Fairness (more) Fairness and UDP Multimedia apps often do not use TCP do not want rate throttled by congestion control Instead use UDP: pump audio/video at constant rate, tolerate packet loss Research area: make themTCP friendly Fairness and parallel TCP connections Can open more than one parallel connections between 2 hosts (NetAnts). Web browsers do this Example: link of rate R supporting 9 connections; new app asks for 1 TCP, gets rate R/10 new app asks for 9 TCPs, gets R/2 ! Ch3-2 -44 Delay modeling Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: TCP connection establishment data transmission delay slow start Notation, assumptions: Assume one link between client and server of rate R S: MSS (bits) O: object/file size (bits) no retransmissions (no loss, no corruption) Window size: First assume: fixed congestion window, W segments Then dynamic window, modeling slow start Ch3-2 -45 Fixed congestion window (1) First case: WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R Ch3-2 -46 Fixed congestion window (2) Second case: WS/R < RTT + S/R: wait for ACK after sending window’s worth of data sent “gap” between two “rounds” is S/R+RTT – WS/R Let K = O / WS be the number of rounds There are K-1 gaps delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] Ch3-2 -47 TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start And P is the number of rounds TCP idles at server (i.e, # of gaps): Will show that the delay for one object is: P O delay 2 RTT idleTime p R p 1 P O S S 2 RTT [ RTT 2 p 1 ] R R p 1 R This is the window size during the p-th round in Slow-Start O S S 2 RTT P[ RTT ] (2 P 1) R R R 1)Let K be the number of sending rounds/windows for the object. (calculate K?) 2)If the object data is large , then when the max sending window is large enough, eventually the idleTime will be 0 (after slow-start ends, as assumed earlier) 3)Let Q be the last round during which idleTime is still > 0 then Ch3-2 -48 TCP Delay Modeling: Slow Start (2) Delay components: • 2 RTT for connection estab and request • O/R to transmit object • time server idles due to slow start initiate TCP connection request object first window = S/R RTT Server idles: P = min{K-1,Q} times second window = 2S/R third window = 4S/R Example: • O/S = 15 segments • K = 4 windows •Q=2 • P = min{K-1,Q} = 2 Server idles P=2 times with a decreasing amt of idleTime fourth window = 8S/R complete transmission object delivered time at client time at server Ch3-2 -49 TCP Delay Modeling (3) 2 p 1 S time to transmit the pth window R initiate TCP connection request object first window = S/R RTT S p 1 S R RTT 2 R idle time after the pth window second window = 2S/R third window = 4S/R fourth window = 8S/R complete transmission object delivered time at client time at server Ch3-2 -50 TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K ? Calculation of Q, number of idles for infinite-size object, is similar: largest Q s.t.: Q 1 S S/R+ RTT >= 2 R Ch3-2 -51 Effect of Slow Start Delay is affected by File/object size O Transmission rate R Fixed window size or MSS idle time RTT Extra delay due to slow start With a large O, and small R and RTT Slow start does not hurt much With a small O and large RxRTT Slow start hurts significantly (percentage wise) Ch3-2 -52 Food For Thought Assume Web page consists of: 1 base HTML page (of size O bits) M images (each of size O bits) Non-persistent HTTP: M+1 TCP connections in series Response time = (M+1)O/R + (M+1)2RTT + sum of idle times What about Persistent HTTP? What about Non-persistent HTTP with X parallel connections? (is this X times fast than having 1 nonpersistent HTTP?) Ch3-2 -53 HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 12 10 8 6 4 2 0 non-persistent persistent parallel nonpersistent 28 100 1 10 Kbps Kbps Mbps Mbps For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections. Ch3-2 -54 HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 non-persistent 40 persistent 30 20 parallel nonpersistent 10 0 28 100 1 10 Kbps Kbps Mbps Mbps For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delaybandwidth networks. Ch3-2 -55 Chapter 3: Summary principles behind transport layer services: reliable data transfer flow control congestion control instantiation and implementation in the Internet UDP TCP Next: leaving the network “edge” (application, transport layers) into the network “core” Ch3-2 -56