Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spring 2006 EE 5304/EETS 7304 Internet Protocols Lecture 14 TCP-Part 1 Tom Oh Dept of Electrical Engineering [email protected] TO 4-25-06 p. 1 Administrative Issues (For distance learning students) If you are graduating this semester, you need to take the Final on May 9, 2006. For in-class students, we will have the Final (Test #3) on May 9, 2006, 6:30PM. The Final will cover lecture 11-15. The Final will consists of multiple choice, T/F and short answers. You are allowed to bring one 3 ½ X 5 card. TO 4-25-06 p. 2 Outline (Comer, Ch. 25) TCP TCP header TCP retransmissions TCP duplicate detection TCP connection set-up and close TO 4-25-06 p. 3 TCP (Transmission Control Protocol) TCP is predominant transport layer protocol to add end-to-end reliability above IP Designed for reliable sequential byte stream delivery with no duplicates, no loss Views application data as continuous byte stream, breaks into segments of 64-Kbyte max. length TO 4-25-06 p. 4 Keeps track of each byte with a sequence number Segments are prefixed with TCP header and encapsulated into IP packets TCP (cont) Sending application ••• Data Receiving application ••• ••• Data Data TCP segment TCP header TCP segment Data TCP header IP packet IP header TO 4-25-06 p. 5 ••• Data TCP header TCP data Data TCP (cont) Provides connection-oriented service between applications on different hosts An application is identified to TCP by port address TO 4-25-06 p. 6 Application is completely identified by 16-bit port address & 32-bit IP address TCP connection is between two endpoints, source <host address, port> and destination <host address, port> TCP (cont) Host Application Host Application TCP port 80 TCP port 25 Transport TO 4-25-06 p. 7 Application Reliable connection-oriented service with no duplicate, lost, misordered, or errored bytes Application TCP port 26 TCP port 18 Transport TCP (cont) TCP assumes IP - a type C network - so has all of most complicated functions of transport protocol Error control detects missing, errored, nonsequential, and duplicate packets Uses sequence numbers and piggybacked ACKs, adaptive retransmissions Flow control using credits Connection control: 3-way handshake Also, TCP assumes responsibility for congestion avoidance because IP has no congestion control TO 4-25-06 p. 8 TCP Header Source port (16 bits): optional; allows replies to sender Destination port (16 bits): identifies application at destination host TO 4-25-06 p. 9 TCP Header Checksum (16 bits): error detection over pseudoheader + TCP segment TO 4-25-06 p. 10 TCP Header (cont) Pseudoheader is constructed from IP packet header including IP source/destination addresses, protocol field (=6 for TCP), length of TCP segment Ensures that IP addresses are correct Like UDP, this violates layering principle of OSI model TO 4-25-06 p. 11 TCP Header Sequence number (32 bits): number of first data byte, except if SYN=1; data bytes are numbered sequentially, to reconstruct sender’s byte stream TO 4-25-06 p. 12 TCP Header (cont) Sending application Byte n Byte n+1 Byte n+2 Receiving application ••• Byte n Data Byte n+2 Data Sequence number tells where this segment belongs in reconstructed byte stream Number of first byte = sequence number TCP header TO 4-25-06 p. 13 Byte n+1 Data ••• TCP Header Acknowledgement (32 bits): piggybacked ACK tells sender the next byte that is expected; ACKs are cumulative and refers to end of contiguous received data; additional received data, if not contiguous, triggers a duplicate ACK TO 4-25-06 p. 14 TCP Header (cont) Receiver’s buffer Sending application bytes Byte 399 Data Segment B received first Data Data Data Segment A SEQ = 400 Segment B SEQ = 600 Segment C SEQ = 800 TO 4-25-06 p. 15 ACK 400 TCP Header (cont) Receiver’s buffer Sending application bytes Byte 399 Data Segment C received second Data Data Data Segment A SEQ = 400 Segment B SEQ = 600 Segment C SEQ = 800 TO 4-25-06 p. 16 ACK 400 duplicate TCP Header (cont) Receiver’s buffer Sending application bytes Byte 999 Data Segment A received third Data Data Segment A SEQ = 400 Segment B SEQ = 600 Segment C SEQ = 800 TO 4-25-06 p. 17 Data ACK 1000 TCP Header (cont) Header length (4 bits): in units of 4 bytes; header is 20 bytes (value = 5) + options (if any) TO 4-25-06 p. 18 Reserved (6 bits): all zeros TCP Header (cont) Flags (6 bits): URG: tells if Urgent pointer is used ACK: tells if Acknowledgement field is used PUSH: forces immediate transmission at sender RST: tells receiver to abort and reset connection SYN: segments for 3-way handshake to set up connection FIN: segments for 3-way handshake to terminate connection TO 4-25-06 p. 19 TCP Header (cont) Urgent pointer (16 bits): used if URG=1 URG flag: tells if Urgent pointer is used TO 4-25-06 p. 20 TCP Header (cont) Urgent pointer (2 bytes): points to number of first byte after urgent data in segment If URG flag =1, data up to urgent pointer is urgent data to be processed immediately; rest of data is regular (not urgent) Allows "out of band" data (to be processed immediately, out of sequence) TCP header Data Urgent data Regular data Urgent pointer TO 4-25-06 p. 21 TCP Header (cont) Push function: Normally, TCP accumulates data from sender before transmitting a segment If sender issues a “push”, TCP will send the ready data, even if segment will be short (e.g., 1 byte of data) TO 4-25-06 p. 22 TCP Header (cont) Window (16 bits): piggybacked credit advertised by receiver; for flow control of sender TO 4-25-06 p. 23 TCP Retransmissions Sender waits for piggybacked acknowledgements ACK is next expected byte (cumulative: acknowledges all previous bytes) ACK does not acknowledge any additional non-contiguous data received Sender will resend if retransmission timer expires TO 4-25-06 p. 24 TCP tries to adjust time-out to just a little longer than estimated roundtrip time (RTT) But timer is very difficult to determine when RTT varies widely in Internet TCP Adaptive Retransmission Algorithm Sender keeps track of returned ACKs as samples of RTT Can continually update estimate of average roundtrip delay as weighted average of new measurement and old estimate, eg: TO 4-25-06 p. 25 TCP Adaptive Retransmission Algorithm (cont) Noticed β should depend on variance of roundtrip samples Estimate can’t keep up with widely varying samples, resulting in unnecessary retransmissions Current algorithm adapts RTO based on mean and variance of RTT TO 4-25-06 p. 26 TCP Adaptive Retransmission Algorithm (cont) packets RTO ACKs mean RTT standard dev. packets RTO ACKs mean RTT standard dev. RTT with small variance TO 4-25-06 p. 27 RTT with large variance TCP Adaptive Retransmission Algorithm (cont) Problem: acknowledgement ambiguity problem Suppose segment is transmitted twice, and then ACKed Does ACK refers to first segment or duplicate? Sender cannot know which case is true TO 4-25-06 p. 28 packet packet duplicate duplicate ACK ACK TCP Adaptive Retransmission Algorithm (cont) If assume ACK from first transmission, RTT estimate could be too small → cause RTO to be too short and unnecessary retransmissions If assume ACK from duplicate packet, RTT estimate could be too large → cause RTO to be too long TO 4-25-06 p. 29 TCP Adaptive Retransmission Algorithm (cont) Karn's algorithm: timer backoff strategy TO 4-25-06 p. 30 RTT estimate is adjusted only for unambiguous ACKs If segment is sent twice due to time-out, ignore measured delay to get its ACK and instead increase next RTO Rate of increase is implementation-dependent, usually increases by factor of 2 On next unambiguous ACK, recompute RTT estimate and reset RTO TCP Duplicate Detection Receiver can get duplicate segments caused by early time-outs, lost ACKs, or late ACKs Should be no confusion because duplicates of TCP segment are identified by same sequence number Large range of sequence numbers needed to avoid ambiguity TO 4-25-06 p. 31 TCP uses 32 bits (4 billion) so sequence numbers will not wrap around in short time Receiver will not be confused by duplicate segments with same number TCP Duplicate Detection (cont) For duplicate segments, receiver assumes first ACK was lost and will ACK the duplicate Sender will not be confused by duplicate ACKs Possible confusion is a duplicate TCP segment arrives after connection is closed and new connection is opened CLS (FIN=1) CLS (FIN=1) CLS (FIN=1) Connection clos es RFC (SYN=1) RFC (SYN=1) RFC (SYN=1) Connection opens old duplicate TCP s egment arrives TO 4-25-06 p. 32 TCP Duplicate Detection (cont) TCP segment from old connection could arrive during new connection and be mistaken for a valid TCP segment TCP avoids this confusion by: TO 4-25-06 p. 33 New connection starts with random initial sequence number Duplicate segments arriving during new connection will probably have a sequence number outside of new range Any duplicate segments received during this time are discarded TCP Duplicate Detection (cont) Byte number 0 Byte number 232 bytes An old segment from another connection will more likely fall outside of expected range when range is very big (as in TCP) TO 4-25-06 p. 34 Byte numbers used for this connection New TCP connection chooses initial byte number at random TCP Duplicate Detection (cont) Also, TCP keeps record of old connection for a timed Wait state after connection is closed TO 4-25-06 p. 35 Time = 2 x Maximum Segment Lifetime (MSL = longest time a TCP segment might take to arrive) Any duplicate segments received during this time are discarded TCP Connection Set-up TCP 3-way handshake: A Connection request; first data byte will be x B SYN=1, SEQ=x SYN=1, SEQ=y, ACK=x+1 Connection confirm; send data starting at byte x TO 4-25-06 p. 36 SYN=1, SEQ=x, ACK=y+1 Connection acknowledgement; first data byte will be y TCP Connection Set-up (cont) As seen before, 3-way handshake works even if both initiate connection at same time Use of retransmission timer may cause duplicate SYN segments but there is no confusion host A host B host A host B SYN i old SYN i SYN j, ACK i SYN j, ACK i SEQ i, ACK j RST , ACK j host A host B SYN i old SYN k, ACK m RST, ACK k SYN j, ACK i SEQ i, ACK j normal TO 4-25-06 p. 37 old SYN, connection is rejected by A delay ed SYN/ACK, connection is rejected by A, new connection is accepted TCP Connection Close 3-way handshake like procedure for connection setup Connection can be closed in one direction with segment with FIN=1 No more data is accepted in this direction Other end will immediately ACK to prevent getting duplicate FIN segments TO 4-25-06 p. 38 Delays FIN response until application is ready to close connection in reverse direction Spring 2006 EE 5304/EETS 7304 Internet Protocols Lecture 14 TCP-Part 2 Tom Oh Dept of Electrical Engineering [email protected] TO 4-25-06 p. 39 Outline TCP flow control TCP congestion avoidance Slow start Fast retransmit and recovery TO 4-25-06 p. 40 Flow Control vs Congestion Control Flow control: destination can slow down source through feedback control Destination may not be ready to receive data Host-to-host control (network not involved) Congestion control: network should not get overloaded with traffic TO 4-25-06 p. 41 May be handled by hosts (e.g., TCP), the network (e.g., resource reservations), or both hosts and network cooperating together (e.g., congestion notification) Flow Control 2 approaches to flow control: Window-based control (typically sliding window): destination constrains how many packets (volume) can be in transit by slowing down ACKs or withholding credits • • Rate-based control: destination constrains the sender’s transmission rate (not volume) • TO 4-25-06 p. 42 Destination simply advertises the amount of its unused buffer space Inefficient for high-speed networks Suited for streaming type applications that need a minimum bandwidth TCP Flow Control TCP flow control operates in units of bytes (not segments) Destination piggybacks ACK (4 bytes) and window advertisement (2 bytes) in data segments going to source TO 4-25-06 p. 43 Advertised window = number of bytes it is ready to receive beyond last ACK’ed byte (i.e., a credit) Example: <ACK n+1, window advertisement = m> gives the sender permission to send up to byte n+m Window advertisement = 0 means stop sending TCP Flow Control (cont) Possible deadlock if destination closes window, then opens window but this credit is lost Destination is expecting data while sender thinks window is closed Sender starts a persist timer when window is closed TO 4-25-06 p. 44 If timer expires, sender will send a window probe (TCP segment with 1-byte data) to see if window has been increased TCP Flow Control (cont) Sender Dest. ACK=x, credit=0 ACK=x, credit=m Persist timer Host is waiting Lost Host is waiting Probe with one byte of data Process continues until credit is received or connection is closed; persist timer doubles each time up to 60 sec TO 4-25-06 p. 45 ACK=x, credit=m Probe should trigger duplicate of last credit or a new credit Congestion Control Without congestion control, Internet would reach congestion collapse Since IP is best effort, sender’s best strategy is to send as much data as possible to hog the network and increase its chances of successful delivery Everyone following this strategy will increase load on network, pushing it into congestion Increasing congestion will cause more retransmissions → higher load will increase congestion even more → congestion collapse: very long delays; network full of duplicate packets; few packets delivered TO 4-25-06 p. 46 Congestion Control (cont) ideal controlled throughput uncontrolled congestion collapse offered load TO 4-25-06 p. 47 Congestion Control (cont) Congestion control can be: Window-based • • Traditional sliding window is naturally responsive to congestion Congestion increases → RTT increases → ACKs slow down → sender slow down Rate-based • • Better suited for streaming type applications Easier to think in terms of fair shares of bandwidth TCP congestion control is window-based TO 4-25-06 p. 48 Congestion Control (cont) Congestion control can be: Preventive: traffic is blocked from entering network to prevent congestion from occurring • Need some type of admission control procedure or explicit congestion notification Reactive: traffic is restricted after congestion occurs • • Can be implemented in hosts without complexity of admission control or congestion notification Congestion prevention is preferred when possible TCP uses reactive congestion control because IP layer does nothing TO 4-25-06 p. 49 Congestion Control (cont) Closely related, congestion control can be: Closed loop • Continuous feedback during transmission allows sender to adapt its rate to current congestion state Open loop • • Traffic is either admitted or blocked; once admitted, transmission is not controlled by feedback but source must conform to its specified rate Good for streaming type applications, if admission control is possible TCP uses closed loop control (keeps routers simple) TO 4-25-06 p. 50 Congestion Control (cont) Closed loop control uses feedback that is either: Explicit • • Congested routers send explicit congestion notification Sender can adapt its rate to current congestion state Implicit • • • Sender must adapt its rate by inferring the congestion state typically from packet losses and RTT No information from routers Performance will not be as good as explicit feedback TCP uses implicit feedback (keeps routers simple) TO 4-25-06 p. 51 TCP Congestion Avoidance (cont) TCP sender reacts to congestion in network by keeping an adaptive “congestion window” Congestion window (cwnd) = amount of data that is appropriate for level of network congestion Current sending window = min(window advertisement, congestion window) Sender is constrained by either network congestion or the destination Congestion avoidance algorithm: adapts congestion window by AIMD (additive increase, multiplicative decrease) TO 4-25-06 p. 52 TCP Congestion Avoidance (cont) Multiplicative decrease: idea is to back off senders quickly (exponentially) when congestion is detected TO 4-25-06 p. 53 TCP assumes a lost segment (detected by retransmission timeout) is caused by congestion, and not because of error in RTO If segment is lost (and retransmitted), decrease congestion window by half If loss continues, congestion window keeps decreasing by half (down to one segment) TCP Congestion Avoidance (cont) Retransmission timeout drops cwnd to half Idealized cwnd Linear increase Time TO 4-25-06 p. 54 TCP Congestion Avoidance (cont) Why back off window exponentially? Some believe queues build exponentially during congestion → sources should back off as quickly Additive increase: when congestion abates (an ACK for new data), increase congestion window linearly (one more segment per RTT) TO 4-25-06 p. 55 Why not increase multiplicatively? Leads to instability and oscillations (easy to cause congestion, harder to recover) TCP Slow Start Idea: if network is in equilibrium (running stably with full window in transit on each connection) when new connection starts or recovering from long period of congestion, sending a large initial window of segments might upset equilibrium and cause oscillations or congestion Slow start: idea is to start congestion window at one segment and gradually increase rate TO 4-25-06 p. 56 Increase congestion window by one segment for each ACK that is returned Attempts to probe network for acceptable sending rate TCP Slow Start (cont) Slow in sense of starting with small window but rate of increase may not be slow Window could increase exponentially: send 1 → get 1 ACK, increase window to 2 → get 2 ACKs, increase window to 4,... This is actually fast rate of increase to allow sender to reach equilibrium point quickly (although gently) Eventually, a segment will be lost • TO 4-25-06 p. 57 Set “slow start threshold” SST = 1/2 current congestion window (the equilibrium point); then go into congestion avoidance Slow Start and Congestion Avoidance These are separate algorithms but implemented together because both triggered by time-out and change congestion window New connection begins with congestion window = 1 segment, SST = 65,535 bytes Go into slow start to search for acceptable window Congestion is indicated by packet loss evidenced by timeout TO 4-25-06 p. 58 Set SST = 1/2 current congestion window Slow Start and Congestion Avoidance If time-out occurred (this assumes that adaptive timer is accurate, so time-out means a lost segment), set congestion window = 1 segment and go into slow start Slow start can continue until window reaches SST (half of window when congestion occurred) TO 4-25-06 p. 59 Then go into congestion avoidance phase: congestion window can increase beyond SST but at more cautious rate (as it approaches the equilibrium point when congestion occurred) Slow Start and Congestion Avoidance In congestion avoidance phase, congestion window increases linearly as long as ACKs are returned Whenever congestion window ≤ SST, it’s in slow start; if congestion window > SST, then it’s in congestion avoidance TO 4-25-06 p. 60 Congestion avoidance Congestion avoidance Slow start Slow start Fast Retransmit and Recovery Algorithm Destination will send duplicate ACK whenever it gets out-of-order segment Sender does not know if duplicate ACKs mean segment was lost or segments were received out of order Fast retransmit algorithm: TO 4-25-06 p. 61 Assumes that out-of-order segments will result in only 1 or 2 duplicate ACKs, and 3 or more duplicate ACKs means a segment was lost TCP Header (cont) Receiver’s buffer First ACK ACK Data Data Data These out-of-order segments will cause 3 duplicate ACKs → TCP assumes that missing segment is lost TO 4-25-06 p. 62 Fast Retransmit and Recovery Algorithm That lost segment is retransmitted immediately (even if retransmit timer hasn’t expired) Fast recovery: do congestion avoidance but not slow start because duplicate ACKs indicate that some segments (after lost segment) were delivered, so congestion is not too bad TO 4-25-06 p. 63 Set SST = 1/2 congestion window Reduce congestion window to half + 3 segments (to allow for 3 segments already at dest.) Expand congestion window linearly until next lost segment Fast Retransmit and Recovery Algorithm Retransmissions around time = 10, 14, and 21 sec SST is sent to 1/2 congestion window but window is allowed to increase with each duplicate ACK When missing segment is ACKed, congestion window closes down to SST TO 4-25-06 p. 64