Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSC/ECE 573 Internet Protocols Transmission Control Protocols Let’s Design TCP Or rather, a transport protocol with the following characteristics Reliability (no erroneous packets + Guaranteed in-order delivery) Flow control Congestion control Copyright Rudra Dutta, NCSU, Spring, 2005 2 What Reliable Transport Protocols Provide Goal Description Addressing Delivery of data from sending application to correct receiving application Reliable Delivery Receiver must get all the data the sender put on the network In-Order Delivery Data segments must be delivered to the receiver in the same order they left the sender Flow Control Sender’s transmission rate should adapt to the receiver's rate Congestion Control Transmission rate should not exceed the speed of slowest link Segmentation Copyright Rudra Dutta, NCSU, Spring, 2005 Data should be sent in segments whose size provides the highest throughput 3 Transport vs. Data-Link Protocols Both protocols have to deal with error/flow control, etc. Data Link Layer Transport Layer Communication is over a physical channel Communication is over a network Explicit addressing not always required (e.g. point-to-point) Explicit addressing required Establishing a (single) connection is simple Large and variable number of connections, establishment is more difficult A channel cannot reorder frames The network may store packets and deliver them later Copyright Rudra Dutta, NCSU, Spring, 2005 4 Outline Overview I. • TCP Message Format and Options II. Connection Establishment & Termination III. Data Transfer (Flow and Error Control) IV. Timer Management V. Congestion Control Copyright Rudra Dutta, NCSU, Spring, 2005 5 Transmission Control Protocol Overview TCP Overview Transmission Control Protocol (RFCs 793, 1122, 1323) Segment: unit of transfer, usually contained in a single IP datagram Reliability achieved using Acknowledgments (therefore, seq. no.s) Timeouts, retransmissions Checksums on header and data Sliding window mechanism for efficient transmission flow control congestion control Important: several flavors (text covers early one) Copyright Rudra Dutta, NCSU, Spring, 2005 7 Properties of TCP Reliable Service Point-to-point Stream orientation: TCP thinks of data as a stream of bytes Unstructured stream: TCP does not honor structured data Virtual circuit / connection orientation: state maintained at both ends Buffered transfer: the software divides data stream into segments independent of application program transfers Full duplex connection: concurrent data transfer in both directions (half duplex possible, but then, no piggy-backing) Copyright Rudra Dutta, NCSU, Spring, 2005 8 TCP Header Format Copyright Rudra Dutta, NCSU, Spring, 2005 9 TCP Connections and Endpoints TCP uses destination port to identify ultimate destination TCP uses connection as fundamental abstraction connection: a pair of endpoints endpoint: (host_IP_address, port #) (socket) TCP ports a port # does not correspond to a single object UDP: single queue per port a TCP port number can be shared by multiple connections on the same machine (remote endpoints differ) Copyright Rudra Dutta, NCSU, Spring, 2005 10 TCP Header Fields Sequence number every byte in data stream is numbered sequence number = number of the first data byte in the segment in the sender's byte stream wraps around after 232 -1 ACK number next sequence number the sender of the acknowledgment expects to receive i.e., sequence number + 1 of last successfully received consecutive data byte valid only when ACK flag = 1 Header length length of header (in bytes) / 4 With maximum value of 15 (24-1), header cannot exceed 60 bytes Copyright Rudra Dutta, NCSU, Spring, 2005 11 TCP Header Fields Window size number of bytes (starting with the one specified in the ACK field) that receiver is willing to accept 16 bits long; max value is 65535 used for flow control Checksum similar to UDP computation (including pseudo-header) but use is mandatory Urgent pointer sequence number of last byte of urgent data (later) added to sequence number of segment Options most common option is maximum segment size (MSS) Copyright Rudra Dutta, NCSU, Spring, 2005 12 Description of Flags in Control Field Flag Description URG The value of the urgent pointer is valid ACK The value of the acknowledgment number is valid PSH Push the data (pass data to receiver as quickly as possible, later) The connection must be reset RST SYN Synchronize the sequence numbers during connection establishment FIN The sender has no more data to transmit Copyright Rudra Dutta, NCSU, Spring, 2005 13 TCP Options Copyright Rudra Dutta, NCSU, Spring, 2005 14 End of Options Used for padding at end of the option field Only one end-of-option used --> after this option receiver looks for payload data Copyright Rudra Dutta, NCSU, Spring, 2005 15 No Operation Option If multiple bytes needed for alignment, no-op option is used Copyright Rudra Dutta, NCSU, Spring, 2005 16 Maximum Segment Size Option Declared during connection establishment phase (in SYN segments) cannot be specified during data transfer Defines the maximum segment size of data the sender is willing to accept MSS must be interface MTU - 40 bytes default: 536 Copyright Rudra Dutta, NCSU, Spring, 2005 17 Long Fat Pipes (LFN) Bandwidth-delay product (capacity of "pipe") Bandwidth (b/s) X Round-trip-time (RTT, in s) Note: consider transmission and propagation delays Example: trans-continental T3 RTT = 6000 Km * 5 s / Km = 30 ms B-D Product = 45 Mbps X 30 ms = 170 KB TCP cannot handle “Long Fat Pipes” efficiently small window size (16 bits maximum of 64 KB) small sequence number space (go-back-N ARQ protocol) Copyright Rudra Dutta, NCSU, Spring, 2005 18 Window Scale Option New window size = window size in header X 2scale-factor equivalent to “shift window size in header left by a number of bits equal to the window scale factor” only carried in segments with SYN flag = 1 (permanent for the connection) Limit on value (look it up) Copyright Rudra Dutta, NCSU, Spring, 2005 19 Timestamp Option Sender puts timestamp option in segment, reflected by receiver in the acknowledgment Can compute RTT for each received ACK allows sender to compute RTT more accurately clock synchronization between two hosts is not required Without timestamps, RTT calculated once every window OK for 8-segment windows but larger windows require better RTT calculations Copyright Rudra Dutta, NCSU, Spring, 2005 20 Transmission Control Protocol Connection Establishment and Termination Connection Establishment Issues Naive approach: Connection Request message Connection Accepted message Sequence numbers always start at 0 Problem: delayed duplicates Copyright Rudra Dutta, NCSU, Spring, 2005 22 Connection Establishment Issues (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 23 Connection Establishment Issues (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 24 Connection Establishment Solution Limit packet lifetime T = Max Segment Lifetime (compare with TTL) Use long sequence numbers wrap around time > 2T = time for a packet and its ACKs to “die” Choose a different initial sequence number (ISN) with each connection request Ignore additional requests for connection after establishment After host crash: wait for time 2T Allow time for old packets to die off Copyright Rudra Dutta, NCSU, Spring, 2005 25 Three-Way Handshake Copyright Rudra Dutta, NCSU, Spring, 2005 26 Three-Way Handshake (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 27 Three-Way Handshake (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 28 Simultaneous Open A connects with B, B connects with A at same time (pass each other in the network) Only one connection will be established! Using only 2 ports (one on A, one on B) Copyright Rudra Dutta, NCSU, Spring, 2005 29 Connection Termination Issues Asymmetric (unilateral) release abrupt, data may be lost Symmetric (bilateral) release each direction released independently of the other each side can close its transmission “half closed” state is possible To avoid data loss in a symmetric release… no side should disconnect until it is convinced that the other side is also prepared to disconnect same as “two-army” problem: no solution! Copyright Rudra Dutta, NCSU, Spring, 2005 30 TCP Connection Termination 1. A sender closes its part of the connection by sending a FIN segment 2. After ACKing the FIN, the receiver can still send data on its part of the connection (half-close) 3. Finally, the receiver closes its part of the connection by sending a FIN segment (graceful close) Copyright Rudra Dutta, NCSU, Spring, 2005 31 TCP Connection Termination (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 32 Simultaneous Close A sends FIN to B, B sends FIN to A at same time (pass each other in the network) FIN_WAIT_1 FIN_WAIT_1 CLOSING CLOSING TIME_WAIT TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 A B 33 Connection Reset A connection can also be aborted with a RST segment (hard reset) normally reserved for error conditions, not normal termination Copyright Rudra Dutta, NCSU, Spring, 2005 34 Half Closed Connection One end of connection (e.g. clientserver) terminates (sends FIN and receives ACK of FIN) Other end (serverclient) remains open (sending data) Other end (server) later terminates (sends FIN and receives ACK of FIN), and connection is then completely closed Copyright Rudra Dutta, NCSU, Spring, 2005 35 TCP Connection States CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 36 TCP: Normal Client Open and Close CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 37 TCP: Simultaneous Open CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 38 TCP: Simultaneous Close CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 39 TCP: Normal Server Open and Close CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 40 TCP: Client Resets Connection Before Establishment Complete CLOSED LISTEN SYN_RCVD SYN_SENT ESTABLISHED FIN_WAIT_1 CLOSING FIN_WAIT_2 TIME_WAIT Copyright Rudra Dutta, NCSU, Spring, 2005 CLOSE_WAIT LAST_ACK 41 TCP State Machine Source: W. Richard Stevens Copyright Rudra Dutta, NCSU, Spring, 2005 42 Transmission Control Protocol Data Transfer Data Transfer in TCP Data received from an application usually sent in segments of size MSS ACKs carry the sequence number of the next byte receiver expects to receive this is a cumulative count Unacknowledged data = data sent by sender, but not yet acknowledged by the receiver i.e., packets not yet received, or acknowledgment not yet received Each ACK carries a window advertisement window = how many additional bytes the receiver is prepared to accept before the sender must wait for an acknowledgment size of window specified in bytes, not segments Copyright Rudra Dutta, NCSU, Spring, 2005 44 Sliding Windows TCP sliding window mechanism allows multiple segments to be sent before an ACK is returned Left boundary of window = earliest unacknowledged byte an acknowledgment advances this left boundary Right boundary of window = latest unacknowledged byte a window advertisement advances this right boundary Copyright Rudra Dutta, NCSU, Spring, 2005 45 Sliding Window Copyright Rudra Dutta, NCSU, Spring, 2005 46 Dynamic Sliding Window 5 Copyright Rudra Dutta, NCSU, Spring, 2005 47 TCP Window Management Example Copyright Rudra Dutta, NCSU, Spring, 2005 48 Pushing Data + Urgent Data Buffering by TCP sender improves efficiency & throughput not appropriate for all applications (why not?) For interactive applications, can use push operation forces TCP to send data without waiting for the buffer to fill sets the PSH flag: forces delivery to application at receiving end There is no “interrupt” message to send out-of-band information (e.g., interrupt, abort signals) instead, set the URG flag urgent pointer points to the urgent data in the segment Copyright Rudra Dutta, NCSU, Spring, 2005 49 Performance Issues Small packets, and small window advertisements, create efficiency problems These can solved by delaying sending of data sender “voluntarily” consolidates multiple small packets into a single larger packet delaying sending of ACKs/window advertisements sender “strongly encouraged” to consolidate multiple small packets into a single larger packet Copyright Rudra Dutta, NCSU, Spring, 2005 50 “Silly Window” () Syndrome A serious problem in sliding window operation Causes 1. 2. sending application program creates data slowly receiving application program consumes data slowly In either case, data may be sent in small segments inefficient use of bandwidth increased processing by TCP TCP now requires avoidance heuristics Copyright Rudra Dutta, NCSU, Spring, 2005 51 SWS Caused by Receiver Copyright Rudra Dutta, NCSU, Spring, 2005 52 SWS Caused by Sender Example: use of Telnet, running interactive editor at a remote host potentially, two 41-byte and two 40-byte datagrams sent for each character typed Sender: send of 1 byte (1+40 bytes) Receiver: acknowledgment of 1 byte (40 bytes) Receiver: echo of 1 byte (1+40 bytes) Sender: acknowledgment of echo (40 bytes) Copyright Rudra Dutta, NCSU, Spring, 2005 53 Sender Side Heuristic Nagle's algorithm when application generates data 1 byte at a time, send the first byte and buffer all the rest until… an ACK is received, or a maximum sized segment is filled, or half the window has filled (sent but not yet ack’ed) The rule is applied even if Push has been requested Features a self-clocking algorithm (does not compute delays) little overhead: no timers, clock readings, etc. does not affect throughput in “normal” use (i.e., when there is lots of data to transmit) Copyright Rudra Dutta, NCSU, Spring, 2005 54 Yet Another Override Nagle’s algorithm is widely used in TCP implementations, but may have to be turned off in some cases ex.: mouse movements in an X-windows application running over the Internet Whenever delay due to “consolidation” is unacceptable Copyright Rudra Dutta, NCSU, Spring, 2005 55 Receiver Side Heuristics Clark's solution do not send window advertisements for 1 byte instead, advertise a window size of zero and wait until there is space for a maximum segment size of data, or the receive buffer is half-empty whichever occurs earlier Nagle's algorithm and Clark's solution are complementary and can be used together Problems? Solutions? Consider the possibility of loss See persist timer Copyright Rudra Dutta, NCSU, Spring, 2005 56 Receiver Side Heuristics Delay window advertisements by delaying ACKs Until avoidance algorithm allows window advertisement May result in delaying ACK until there is data to send, so can “piggyback” ACK with data May result in single ACK for multiple segments Reduces traffic but may force sender to timeout and retransmit a segment (incorrectly) ACKs may not be delayed for more than 500ms RTT estimation may suffer ACK at least every other segment Copyright Rudra Dutta, NCSU, Spring, 2005 57 TCP Error Control 1. Error detection checksum: to check for corrupted segments at destination ACK: to confirm receipt of segment by destination time-out: one retransmission timer for each segment sent 2. Error correction – no FEC source retransmits segments for which retransmission timer expired Copyright Rudra Dutta, NCSU, Spring, 2005 58 Responses to Errors Fundamentally GoBackN Different versions of TCP have different detail behavior Corrupted or lost segments will not be acknowledged by receiver will be retransmitted by sender Duplicate segments ignored by receiver Copyright Rudra Dutta, NCSU, Spring, 2005 59 Responses to Errors (cont’d) Out-of-order segments (receive segment n+1 before segment n) buffered by receiver to improve efficiency not ACKed send a “duplicate” ACK for the last acknowledged segment Lost ACKs may lead to retransmission of data most likely will not be noticed (another, later ACK also indicates segment was received) Combination of GoBackN, NACK, SRP Copyright Rudra Dutta, NCSU, Spring, 2005 60 Transmission Control Protocol Timers and Congestion Control Types of TCP Timers 1. Retransmission 2. Persistence 3. Keep-Alive 4. Time-Waited Copyright Rudra Dutta, NCSU, Spring, 2005 62 1. Retransmission Timer A. When a segment is sent, a retransmission timer is started B. If segment is ACKed before timer expires, the timer is stopped C. If timer expires before ACK arrives, the segment is retransmitted and the timer is started again Copyright Rudra Dutta, NCSU, Spring, 2005 63 Retransmission Timer(cont’d) How long should timeout interval be? too short: retransmitting when don’t need to! too long: waiting too long to retransmit when needed! Terminology RTTsample = time measured for round trip of one packet RTT = estimate of current round trip time RTO = retransmission timeout interval Issues obtain accurate estimate of round-trip time choose a good value for retransmission timeout interval Copyright Rudra Dutta, NCSU, Spring, 2005 64 Measurement of Round-Trip Time Basic RTT estimation explicitly measure the interval between the time a segment is transmitted and the time the ACK for the segment returns may use TCP timestamp option RTT = * RTT + (1-)* RTTsample Recommended value for = .9 Karn's improvement do not update RTT for any segments that have been retransmitted avoids ambiguity about which timer value to use Copyright Rudra Dutta, NCSU, Spring, 2005 65 Determining RTO Basic (original) method RTO = 2 * RTT (i.e. allow two roundtrip times before timing out) Problem When RTT has a high standard deviation, this method is too conservative (times out too quickly) Copyright Rudra Dutta, NCSU, Spring, 2005 66 Determining RTO (cont’d) Solution estimate both mean and standard deviation of RTT set RTO = to RTT + a multiple of the standard deviation use mean deviation (MDEV)as a cheap estimate of standard deviation Algorithm RTT = RTT + *(RTTsample – RTT) MDEV = * MDEV + (1-) * RTTsample - RTT RTO = RTT + 4 * MDEV ( = .75, = .125) Comments 99% of packets arrive within 4 standard deviations The approximations above are extremely efficient computationally Copyright Rudra Dutta, NCSU, Spring, 2005 67 TCP Coarse-Grain Timer TCP uses a 500-ms clock for time measurements values obtained may differ up to 500 ms from actual value` Copyright Rudra Dutta, NCSU, Spring, 2005 68 Retransmission Timer Backoff Not updating the RTT for lost / retransmitted segments may lead to trouble Retransmission may have occurred because RTO is too low E.g. maybe network suddenly became overloaded should therefore increase RTO Exponential backoff upon timeout, set RTO = 2 * RTO continue doubling of RTO for each retransmission when an ACK arrives for a segment that did not require retransmission, update RTT and RTO according to the “normal” computation Copyright Rudra Dutta, NCSU, Spring, 2005 69 2. Persist Timer It is possible for the advertised window to go to 0 i.e., sender is transmitting faster than receiver can receive The receiver can request more data by sending an ACK advertising the size of the available receive buffer If this ACK is lost deadlock! Copyright Rudra Dutta, NCSU, Spring, 2005 70 2. Persist Timer (cont’d) After receiving any ACK with window_size = 0, start a persist timer Upon expiration of the persist timer, send a probe packet Probe packet = 1 byte of data The response (ACK) to the probe gives the window size recovers from lost window advertisements by “stimulating” new advertisements Copyright Rudra Dutta, NCSU, Spring, 2005 71 Persist Timer (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 72 3. Keepalive Timer It is possible for TCP connections to be idle for a long time e.g., client opens a connection to a server, but never makes a request consumes resources (memory) on the server Keepalive timer maintained by some implementations not part of TCP standard somewhat controversial Copyright Rudra Dutta, NCSU, Spring, 2005 73 3. Keepalive Timer (cont’d) The timer is reset every time a segment is received timer interval = 2 hours When timer expires, check if other side is still there repeat 10 times at 75 second intervals if there is no response, the connection is terminated Copyright Rudra Dutta, NCSU, Spring, 2005 74 4. Time-Wait Timer After agreeing with the other side to close a connection, TCP enters the TIME_WAIT state starts a timer that runs for twice the maximum packet lifetime the connection is held in limbo during this period the connection is closed after the timer expires This ensures that all packets (and their ACKs) created during the connection have “died off” (reached their destination or been discarded) Copyright Rudra Dutta, NCSU, Spring, 2005 75 Flow vs. Congestion Control Copyright Rudra Dutta, NCSU, Spring, 2005 76 Congestion Control: “Open Loop” Approach 1. Source describes the traffic it will be sending through the network problem: choosing right set of parameters to describe a source 2. The network reserves resources (bandwidth, buffer space) during connection establishment resources will be held by the connection for the duration 3. Source “shapes” its traffic to match its description Copyright Rudra Dutta, NCSU, Spring, 2005 77 Open Loop (cont’d) Results network overload / congestion will be avoided connection performance is guaranteed Significant amount of work in this area ATM standards Copyright Rudra Dutta, NCSU, Spring, 2005 78 Congestion Control: “Closed Loop” Approach 1. Source establishes connection, no traffic description required 2. Feedback about congestion is continually provided 3. Source dynamically adapts its traffic characteristics to reduce congestion or use available bandwidth source must decide how to best react to current network state not policed or enforced by the network! Copyright Rudra Dutta, NCSU, Spring, 2005 79 “Closed Loop” Approach (cont’d) Results congestion is detected and then reduced no guarantees of connection performance How does the source learn that its service rate has changed? Choices: 1. explicit feedback: the network conveys this info to the source 2. example: ICMP source quench implicit feedback: source infers a change in its service rate by measuring its current end-to-end performance example: TCP acknowledgments Copyright Rudra Dutta, NCSU, Spring, 2005 80 TCP Congestion Control 1. Uses a closed-loop approach 2. Feedback is implicit (ACKs) 3. Adapts rate based on dynamic window smaller window = lower rate, larger window = higher rate 4. Control mechanism is end-to-end no connection state maintained by the network! Copyright Rudra Dutta, NCSU, Spring, 2005 81 TCP Implicit Feedback TCP relies on feedback from the network to detect congestion timeout caused by a lost packet (used by TCP-Tahoe, TCP-Reno) duplicate ACK (used by TCP-Reno) Assumption: packet loss is due to congested routers, not transmission errors Incorrect assumption for wireless networks! Copyright Rudra Dutta, NCSU, Spring, 2005 82 TCP Dynamic Windows Window #1: advertised by receiver purpose: avoid overrunning a slow receiver (i.e., for flow control) Window #2: maintained by sender purpose: avoid network overload (i.e. for congestion control) Called the congestion window, or cwnd the sending TCP dynamically manipulates cwnd “The” window size = MIN(receiver_advertisement, cwnd) Copyright Rudra Dutta, NCSU, Spring, 2005 83 TCP Congestion Control Overview 1. Probe for available bandwidth by increasing cwnd “slow” start = exponential increase phase congestion avoidance = linear increase phase 2. Slow start threshold (= ssthresh) controls transition from exponential to linear phase 3. Upon packet loss (timeout, duplicate acknowledgment)… retransmit the packet reduce the window size TCP Reno adds (1) fast retransmit, and (2) fast recovery Copyright Rudra Dutta, NCSU, Spring, 2005 84 Slow Start At connection establishment cwnd 1, ssthresh 1/2 initial receiver window advertisement (both cwnd and ssthresh are in units of MSS in the following) Upon arrival of each ACK: cwnd cwnd + 1 Not slow; exponential increase! window increases by cwnd every RTT can quickly congest the network and cause packet loss Copyright Rudra Dutta, NCSU, Spring, 2005 85 Slow Start (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 86 Congestion Avoidance Basic idea: avoid increasing the window size too quickly slow down when approaching the “steady state” Use slow start as long as cwnd ssthresh, enter congestion avoidance phase when cwnd > ssthresh Upon arrival of each ACK: cwnd cwnd + 1/cwnd Linear increase window increases by 1 every RTT Copyright Rudra Dutta, NCSU, Spring, 2005 87 Response to Congestion (TCP Tahoe/Reno) When congestion occurs Packet deliver may be slowed, or fail Acknowledgment delivery may be slowed, or fail In either case, timeout / retransmission may occur transmission_window = MIN(receiver_advertisement,cwnd) ssthresh MAX(2, ½* transmission_window) When timeout occurs… cwnd 1 (enter slow start phase, from the top) Copyright Rudra Dutta, NCSU, Spring, 2005 88 Evolution of TCP's Congestion Window Copyright Rudra Dutta, NCSU, Spring, 2005 89 “Slow Start”, “Exponential Decay” Consider only steady state conditions Increase of transmission window Congestion avoidance in the steady state Not slow start Window increases by 1 each RTT Additive increase Decrease of transmission window Only congestion, in steady state Successive timeouts Window halved in each RTT Multiplicative decrease Copyright Rudra Dutta, NCSU, Spring, 2005 90 How Good is This? On the whole a simple strategy Logic not obvious Consider the fairness issues Two TCP connections over a bottleneck link Ideally, should share bandwidth But start times and windows could be very different Copyright Rudra Dutta, NCSU, Spring, 2005 91 How Good is This? Together, the connections cannot get more than bottleneck bandwidth And should get that much We would like them to be fair and share equally Consider a situation when this is not true Successive adjustments tend to stabilize Copyright Rudra Dutta, NCSU, Spring, 2005 92 Fast Retransmission (TCP Reno) When out-of-order segment arrives receiver issues ACK with the same cumulative sequence number as the previous ACK called a duplicate ACK (acts like a NAK) Indicates data is still flowing between sender and receiver Receipt of 3 duplicate ACKs in a row for segment X is a strong indication that segment X has been lost Sender actions retransmit the (presumably) lost segment without waiting for its timer to expire, and no delay ssthresh ½ * transmission_window But cwnd is not set to 1 called fast retransmission Copyright Rudra Dutta, NCSU, Spring, 2005 93 Fast Recovery (TCP Reno) Receipt of duplicate ACKs implies data is still flowing: do not enter slow start yet After a fast retransmission cwnd ssthresh + 3 fast recovery, enter linear phase Each time another duplicate ACK arrives cwnd cwnd + 1 When ACK for a retransmitted segment arrives cwnd ssthresh Back to normal Copyright Rudra Dutta, NCSU, Spring, 2005 94 Congestion Example Source/receiver have buffers of 8192 bytes MSS = 1024 Copyright Rudra Dutta, NCSU, Spring, 2005 95 Congestion Example: Slow Start Copyright Rudra Dutta, NCSU, Spring, 2005 96 Congestion Example: Lost Segment Copyright Rudra Dutta, NCSU, Spring, 2005 97 Congestion Example: Fast Retransmission Copyright Rudra Dutta, NCSU, Spring, 2005 98 TCP and Transactions (Request / Reply) Many TCP transactions consist simply of a request to a server and a reply to the client Problems these simple transactions may require the exchange of 10 segments (why?) a simple transaction also takes at least two RTT times plus the processing time at the server Copyright Rudra Dutta, NCSU, Spring, 2005 99 TCP Extension for Transactions (cont'd) To distinguish between consecutive transactions, a connection count (CC) TCP option is added to the header The server maintains a per-host cache of the most recent CC MSS window size RTO The client combines the following into one segment SYN Request FIN also includes the current CC value Copyright Rudra Dutta, NCSU, Spring, 2005 100 TCP Extension for Transactions (cont'd) If the received CC > the cached CC, the server combines the following into a single segment SYN ACK of the client’s SYN reply FIN Else, the normal 3-way handshake is performed Client ACKs the server's (SYN + FIN + data) in one segment Time needed is the minimum 1 RTT + processing time at server Copyright Rudra Dutta, NCSU, Spring, 2005 101 TCP Extension for Transactions (cont'd) Copyright Rudra Dutta, NCSU, Spring, 2005 102