Download Lecture04

Lecture 04: Transport Layer  Transport layer protocols in the Internet:  UDP: connectionless transport  TCP: connection-oriented transport  TCP congestion control 4-1 Provides end-to-end connectivity, but not necessarily good performance name link session path address Internet transport-layer protocols  reliable, in-order delivery (TCP)    congestion control flow control connection setup  unreliable, unordered delivery: UDP  no-frills extension of “best-effort” IP  services not available:  delay guarantees  bandwidth guarantees application transport network data link physical network data link physical network data link physical network data link physicalnetwork network data link physical data link physical network data link physical application transport network data link physical Two Basic Transport Features  Demultiplexing: port numbers Server host 128.2.194.242 Client host Service request for 128.2.194.242:80 (i.e., the Web server) Web server (port 80) OS Client Echo server (port 7)  Error detection: checksums IP payload detect corruption User Datagram Protocol (UDP)  Datagram messaging service  Demultiplexing: port numbers  Detecting corruption: checksum  Lightweight communication between processes  Send and receive messages  Avoid overhead of ordered, reliable delivery SRC port DST port checksum length DATA Advantages of UDP  Fine-grain control  UDP sends as soon as the application writes  No connection set-up delay  UDP sends without establishing a connection  No connection state  No buffers, parameters, sequence #s, etc.  Small header overhead  UDP header is only eight-bytes long Popular Applications That Use UDP  Multimedia streaming   Retransmitting packets is not always worthwhile E.g., phone calls, video conferencing, gaming, IPTV  Simple query-response protocols   Overhead of connection establishment is overkill E.g., Domain Name System (DNS), DHCP, etc. “Address for www.cnn.com?” “12.3.4.15” Transmission Control Protocol (TCP)  Stream-of-bytes service  Sends and receives a stream of bytes  Reliable, in-order delivery    Corruption: checksums Detect loss/reordering: sequence numbers Reliable delivery: acknowledgments and retransmissions  Connection oriented  Explicit set-up and tear-down of TCP connection  Flow control  Prevent overflow of the receiver’s buffer space  Congestion control  Adapt to network congestion for the greater good Breaking a Stream of Bytes into TCP Segments TCP “Stream of Bytes” Service Host A Host B …Emulated Using TCP “Segments” Host A Segment sent when: TCP Data Host B 1. 2. 3. TCP Data Segment full (Max Segment Size), Not full, but times out, or “Pushed” by application. TCP Segment  IP packet IP Data TCP Data (segment) TCP Hdr IP Hdr No bigger than Maximum Transmission Unit (MTU)  E.g., up to 1500 bytes on an Ethernet link   TCP packet  IP packet with a TCP header and data inside  TCP header is typically 20 bytes long  TCP segment No more than Maximum Segment Size (MSS) bytes  E.g., up to 1460 consecutive bytes from the stream  Sequence Number Host A ISN (initial sequence number) Sequence number = 1st byte Host B TCP Data TCP Data Reliable Delivery on a Lossy Channel With Bit Errors Challenges of Reliable Data Transfer  Over a perfectly reliable channel  Easy: sender sends, and receiver receives  Over a channel with bit errors  Receiver detects errors and requests retransmission  Over a lossy channel with bit errors  Some data are missing, and others corrupted  Receiver cannot always detect loss  Over a channel that may reorder packets  Receiver cannot distinguish loss from out-oforder An Analogy  Alice and Bob are talking What if Bob couldn’t understand Alice?  Bob asks Alice to repeat what she said   What if Bob hasn’t heard Alice for a while?  Is Alice just being quiet? Has she lost reception?  How long should Bob just keep on talking?  Maybe Alice should periodically say “uh huh”  … or Bob should ask “Can you hear me now?”  Take-Aways from the Example  Acknowledgments from receiver Positive: “okay” or “uh huh” or “ACK”  Negative: “please repeat that” or “NACK”   Retransmission by the sender  After not receiving an “ACK”  After receiving a “NACK”  Timeout by the sender (“stop and wait”)  Don’t wait forever without some acknowledgment TCP Support for Reliable Delivery  Detect bit errors: checksum Used to detect corrupted data at the receiver  …leading the receiver to drop the packet   Detect missing data: sequence number  Used to detect a gap in the stream of bytes  ... and for putting the data back in order  Recover from lost data: retransmission  Sender retransmits lost or corrupted data  Two main ways to detect lost packets TCP Acknowledgments Host A ISN (initial sequence number) Sequence number = 1st byte Host B TCP Data ACK sequence number = next expected byte TCP Data Automatic Repeat reQuest (ARQ)  ACK and timeouts Receiver sends ACK when it receives packet  Sender waits for ACK and times out  Simplest ARQ protocol  Stop and wait  Send a packet, stop and wait until ACK arrives  Timeout Sender Time Receiver Flow Control: TCP Sliding Window Motivation for Sliding Window  Stop-and-wait is inefficient Only one TCP segment is “in flight” at a time  Especially bad for high “delay-bandwidth product”  bandwidth 22 delay Numerical Example  1.5 Mbps link with 45 msec round-trip time (RTT)  Delay-bandwidth product is 67.5 Kbits (or 8 KBytes)  Sender can send at most one packet per RTT Assuming a segment size of 1 KB (8 Kbits)  8 Kbits/segment at 45 msec/segment  182 Kbps  That’s just one-eighth of the 1.5 Mbps link capacity  Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged packets   range of sequence numbers must be increased buffering at sender and/or receiver  Pipelined protocols: concurrent logical channels, sliding window protocol 3-24 Sliding Window Protocol  Consider an infinite array, Source, at the sender, and an infinite array, Sink, at the receiver. send window Source: P1 Sender 0 1 2 a–1 a acknowledged unacknowledged next expected received Sink: P2 Receiver 0 1 s–1 s 2 r + RW – 1 r delivered receive window RW receive window size SW send window size (s - a  SW) 3-25 Sliding Windows in Action  Data unit r has just been received by P2  Receive window slides forward  P2 sends cumulative ack with sequence number it expects to receive next (r+3) send window Source: P1 Sender 0 1 2 a–1 a acknowledged s–1 s unacknowledged r+3 next expected Sink: P2 Receiver 0 1 2 r + RW – 1 r delivered receive window 3-26 Sliding Windows in Action  P1 has just received cumulative ack with r+3 as next expected sequence number  Send window slides forward send window Source: P1 Sender 0 1 2 a–1 a s–1 s acknowledged next expected Sink: P2 Receiver 0 1 2 r + RW – 1 r delivered receive window 3-27 Sliding Window protocol  Functions provided  error control (reliable delivery)  in-order delivery  flow and congestion control (by varying send window size)  TCP uses only cumulative acks  Other kinds of acks  selective nack  selective ack (TCP SACK)  bit-vector representing entire state of receive window (in addition to first sequence number of window) 3-28 Sliding Window Protocol At the sender, a will be pointed to by SendBase, and s by NextSeqNum send window Source: P1 Sender 0 1 2 a–1 a acknowledged unacknowledged next expected received Sink: P2 Receiver 0 1 s–1 s 2 r + RW – 1 r delivered receive window RW receive window size SW send window size (s - a  SW) 3-29 TCP Flow Control flow control sender won’t overrun receiver’s buffers by transmitting too much, too fast buffer at receive side of a TCP connection receiver: explicitly informs sender of (dynamically changing) amount of free buffer space  RcvWindow field in TCP segment sender: keeps amount of transmitted, unACKed data less than most recently received RcvWindow value 3-30 Optimizing Retransmissions Packet lost Timeout Timeout Timeout Timeout Timeout Timeout Reasons for Retransmission ACK lost DUPLICATE PACKET Early timeout DUPLICATE PACKETS How Long Should Sender Wait?  Sender sets a timeout to wait for an ACK Too short: wasted retransmissions  Too long: excessive delays when packet lost   TCP sets timeout as a function of the RTT  Expect ACK to arrive after an “round-trip time”  … plus a fudge factor to account for queuing  But, how does the sender know the RTT?  Running average of delay to receive an ACK TCP Round Trip Time and Timeout Q: how to estimate RTT?  SampleRTT: measured time from segment transmission until ACK receipt  ignore retransmissions  SampleRTT will vary, want estimated RTT “smoother”  average several recent measurements, not just current SampleRTT TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT  Exponential weighted moving average  influence of past sample decreases exponentially fast  typical value:  = 0.125 Example RTT estimation: RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 350 RTT (milliseconds) 300 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 time (seconnds) SampleRTT Estimated RTT 78 85 92 99 106 TCP: retransmission scenarios Host A X loss Sendbase = 100 SendBase = 120 SendBase = 100 time SendBase = 120 lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A time premature timeout scenario 3-37 TCP retransmission scenarios (more) timeout Host A Host B X loss SendBase = 120 time Cumulative ACK scenario 3-38 Fast Retransmit  Time-out period often relatively long:  long delay before resending lost packet  Detect lost segments via duplicate ACKs.   Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs.  If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost:  fast retransmit: resend segment before timer expires Host A Host B timeout X time Figure 3.37 Resending a segment after triple duplicate ACK Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there remains a not-yet-acknowledged segment) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y reset timer for y } a duplicate ACK for already ACKed segment fast retransmit 3-41 Effectiveness of Fast Retransmit  When does Fast Retransmit work best? High likelihood of many packets in flight  Long data transfers, large window size, …  Implications for Web traffic  Most Web transfers are short (e.g., 10 packets) • So, often there aren’t many packets in flight  Making fast retransmit is less likely to “kick in” • Forcing users to click “reload” more often…   Starting and Ending a Connection: TCP Handshakes Establishing a TCP Connection A B Each host tells its ISN to the other host.  Three-way handshake to establish connection Host A sends a SYN (open) to the host B  Host B returns a SYN acknowledgment (SYN ACK)  Host A sends an ACK to acknowledge the SYN ACK  What if the SYN Packet Gets Lost?  Suppose the SYN packet gets lost Packet is lost inside the network, or  Server rejects the packet (e.g., listen queue is full)   Eventually, no SYN-ACK arrives  Sender sets a timer and wait for the SYN-ACK  … and retransmits the SYN if needed  How should the TCP sender set the timer?  Sender has no idea how far away the receiver is  Some TCPs use a default of 3 or 6 seconds SYN Loss and Web Downloads  User clicks on a hypertext link Browser creates a socket and does a “connect”  The “connect” triggers the OS to transmit a SYN   If the SYN is lost…  The 3-6 seconds of delay is very long  The impatient user may click “reload”  User triggers an “abort” of the “connect”  Browser “connects” on a new socket  Essentially, forces a fast send of a new SYN! Lecture 04: Transport Layer  Transport layer protocols in the Internet:  UDP: connectionless transport  TCP: connection-oriented transport  TCP congestion control Principles of Congestion Control Congestion:  informally: “too many sources sending too much data too fast for network to handle”  different from flow control!  manifestations:  lost packets (buffer overflow at routers)  long delays (queueing in router buffers)  a top-10 problem! Receiver Window vs. Congestion Window  Flow control Keep a fast sender from overwhelming a slow receiver  Congestion control  Keep a set of senders from overloading the network   Different concepts, but similar mechanisms  TCP flow control: receiver window  TCP congestion control: congestion window  Sender TCP window = min { congestion window, receiver window } How it Looks to the End Host Packet experiences high delay  Loss: Packet gets dropped along path  Delay:  How does TCP sender learn this?  Delay: Round-trip time estimate  Loss: Timeout and/or duplicate acknowledgments ✗ Congestion Collapse  Easily leads to congestion collapse Senders retransmit the lost packets  Leading to even greater load  … and even more packet loss  “congestion collapse” Goodput Load Increase in load that results in a decrease in useful work done. Approaches towards congestion control End-to-end congestion control:  no explicit feedback from network  congestion inferred from end-system’s observed loss and/or delay  approach taken by TCP Network-assisted congestion control:  routers provide feedback to end systems  single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)  explicit sending rate for sender TCP Congestion control  end-to-end control (no network assistance)  Tradeoff Pro: avoids needing explicit network feedback  Con: continually under- and over-shoots “right” rate  TCP Congestion control  Each TCP sender maintains a congestion window  Max number of bytes to have in transit (not yet ACK’d)  Adapting the congestion window Decrease upon losing a packet: backing off  Increase upon success: optimistically exploring  Always struggling to find right transfer rate  TCP Congestion Control How does sender determine CongWin?  loss event = timeout or 3 duplicate acks  TCP sender reduces CongWin after loss event three mechanisms:    slow start AIMD reduce to 1 segment after timeout event TCP Slow Start  Probing for usable bandwidth  When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec  initial rate = 20 kbps   available bandwidth may be >> MSS/RTT  desirable to quickly ramp up to a higher rate TCP Slow Start (more)  When connection   Host B RTT begins, increase rate exponentially until first loss event or “threshold” Host A double CongWin every RTT done by incrementing CongWin by 1 MSS for every ACK received  Summary: initial rate is slow but ramps up exponentially fast time Congestion avoidance state & responses to loss events Implementation:  For initial slow start, 14 congestion window size (segments) Q: If no loss, when should the exponential increase switch to linear? A: When CongWin gets to current value of threshold threshold is set to a very large value (e.g., 65 Kbytes)  At loss event, threshold is set to 1/2 of CongWin just before loss event TCP Reno 12 10 8 6 threshold 4 2 0 TCP Tahoe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round Series1 Series2 Rationale for Reno’s Fast Recovery  After 3 dup ACKs:  3 dup ACKs indicates network capable of delivering some segments  timeout occurring before 3 dup ACKs is “more alarming” CongWin is cut in half  window then grows linearly  But after timeout event:  CongWin is set to 1 MSS instead;  window then grows exponentially to a threshold, then grows linearly  Summary: TCP Congestion Control  When CongWin is below Threshold, sender in slow-start phase, window grows exponentially.  When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly.  When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold.  When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. AIMD in steady state additive increase: increase CongWin by 1 MSS every RTT in the absence of any loss event: probing multiplicative decrease: cut CongWin in half after loss event (3 dup acks) congestion window 24 Kbytes 16 Kbytes 8 Kbytes Long-lived TCP connection time Why is TCP fair? Two competing sessions: R equal window size loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 window size R TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K (AIMD only provides convergence to same window size, not necessarily same throughput rate) TCP connection 1 TCP connection 2 bottleneck router capacity R Fairness (more) Fairness and UDP  Multimedia apps often do not use TCP  do not want rate throttled by congestion control  Instead use UDP:  pump audio/video at constant rate, tolerate packet loss  TCP-friendly congestion control for apps that prefer UDP, e.g., Datagram Congestion Control Protocol (DCCP) End of Lecture04

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture04