Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Transport Protocols Relates to Lab 5. UDP and TCP 1 Midterm 2 Roadmap • UDP – Unreliable, connectionless datagram service • TCP – Reliable, in order, connection-oriented, byte stream service • Principles – Multiplexing/demultiplexing – How to build reliable service on top of unreliable service 3 Orientation • We move one layer up and look at the transport layer. User Process User Process User Process TCP User Process Application Layer UDP Transport Layer ICMP IP IGMP Network Layer ARP Hardware Interface RARP Link Layer Media 4 Orientation • Transport layer protocols are end-to-end protocols • They are only implemented at the hosts HOST HOST Application Application Transport Transport Network Data Link Network Data Link Network Data Link Data Link 5 Transport Protocols in the Internet • The most commonly used transport protocols are UDP and TCP. • • • • • • UDP - User Datagram Protocol datagram oriented unreliable, connectionless simple unicast and multicast useful only for few applications, e.g., multimedia applications used a lot for services – network management (SNMP), routing (RIP), naming (DNS), etc. • • • • • TCP - Transmission Control Protocol byte stream oriented reliable, connection-oriented complex only unicast used for most Internet applications: – web (http), email (smtp), file transfer (ftp), terminal (telnet), etc. 6 UDP - User Datagram Protocol • UDP supports unreliable transmissions of datagrams – Each output operation by a process produces exactly one UDP datagram • The only thing that UDP adds is multiplexing and demultiplexing • Protocol number: 17 Applications Applications UDP UDP IP IP IP IP IP 7 UDP Format IP header UDP header 20 bytes UDP data 8 bytes Source Port Number Destination Port Number UDP message length Checksum DATA 0 15 16 31 • Port numbers identify sending and receiving applications (processes). Maximum port number is 216-1= 65,535 •Message Length is at least 8 bytes (I.e., Data field can be empty) and at most 65,535 •Checksum includes UDP header and data. 8 Port Numbers • UDP (and TCP) use port numbers to identify applications • A globally unique address at the transport layer (for both UDP and TCP) is a tuple <IP address, port number> • There are 65,535 UDP ports per host. User Process User Process User Process User Process TCP User Process UDP IP User Process Demultiplex based on port number Demultiplex based on Protocol field in IP header 9 Transport Control Protocol (TCP) 10 Overview Byte Stream Byte Stream TCP = Transmission Control Protocol • Connection-oriented protocol • Provides a reliable unicast end-to-end byte stream over an unreliable internetwork. TCP TCP IP Internetwork 11 Connection-Oriented • Before any data transfer, TCP establishes a connection: • Analogy: making a phone call • One TCP entity is waiting for a connection (“server”) • The other TCP entity (“client”) contacts the server • Each connection is full duplex CLIENT SERVER Request a co nnection onnection Accept a c Data Transer waiting for connection request Disconnect 12 Reliable • Byte stream is broken up into chunks which are called segments • Receiver sends acknowledgements (ACKs) for segments • TCP maintains a timer. If an ACK is not received in time, the segment is retransmitted •Detecting errors and packet losses: • TCP has checksums for header and data. Segments with invalid checksums are discarded • Each byte that is transmitted has a sequence number 13 Byte Stream Service • To the lower layers, TCP handles data in blocks, the segments. • To the higher layers TCP handles data as a sequence of bytes and does not identify boundaries between bytes • So: Higher layers do not know about the beginning and end of segments ! Application Application 1. read 40 bytes 2. read 40 bytes 3. read 40 bytes 1. write 100 bytes 2. write 20 bytes TCP queue of bytes to be transmitted Segments TCP queue of bytes that have been received 14 TCP Format • TCP segments have a 20 byte header with >= 0 bytes of data. IP header TCP header 20 bytes TCP data 20 bytes 0 15 16 Source Port Number 31 Destination Port Number Sequence number (32 bits) header length 0 Flags TCP checksum 20 bytes Acknowledgement number (32 bits) window size urgent pointer Options (if any) DATA 15 TCP header fields • Port Number: • A port number identifies the endpoint of a connection. • A pair <IP address, port number> identifies one endpoint of a connection. • Two pairs <client IP address, server port number> and <server IP address, server port number> identify a TCP connection. Applications Ports: 23 80 104 Applications 7 80 16 TCP TCP IP IP Ports: 16 TCP header fields • Sequence Number (SeqNo): – Sequence number is 32 bits long. – So the range of SeqNo is 0 <= SeqNo <= 232 -1 4.3 Gbyte – The sequence number in a segment identifies the first byte in the segment – Initial Sequence Number (ISN) of a connection is set during connection establishment 17 TCP header fields • Acknowledgement Number (AckNo): – Acknowledgements are piggybacked, I.e a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction – A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”) – The AckNo contains the next SeqNo that a host is expecting Example: The acknowledgement for a segment with sequence numbers 0-1460 is AckNo=1461 – ACK is cumulative 18 TCP header fields • Header Length ( 4bits): – Length of header in 32-bit words – Note that TCP header has variable length (with minimum 20 bytes) 19 TCP header fields • Flag bits: – URG: Urgent pointer is valid – If the bit is set, the following bytes contain an urgent message in the range: SeqNo <= urgent message <= SeqNo+urgent pointer – ACK: Acknowledgement Number is valid – PSH: PUSH Flag – Notification from sender to the receiver that the receiver should pass all data that it has to the application. – Normally set by sender when the sender’s buffer is empty 20 TCP header fields • Flag bits: – RST: Reset the connection – The flag causes the receiver to reset the connection – Receiver of a RST terminates the connection and indicates higher layer application about the reset – SYN: Synchronize sequence numbers – Sent in the first packet when initiating a connection – FIN: Sender is finished with sending – Used for closing a connection – Both sides of a connection must send a FIN 21 TCP header fields • Window Size: – Each side of the connection advertises the window size – Window size is the maximum number of bytes that a receiver can accept. – Maximum window size is 216-1= 65535 bytes • TCP Checksum: – TCP checksum covers over both TCP header and TCP data (also covers some parts of the IP header) • Urgent Pointer: – Only valid if URG flag is set 22 TCP header fields • Options: End of Options kind=0 1 byte NOP (no operation) kind=1 1 byte Maximum Segment Size Window Scale Factor Timestamp kind=2 len=4 maximum segment size 1 byte 1 byte 2 bytes kind=3 len=3 shift count 1 byte 1 byte 1 byte kind=8 len=10 timestamp value timestamp echo reply 1 byte 1 byte 4 bytes 4 bytes 23 TCP header fields • Options: – NOP is used to pad TCP header to multiples of 4 bytes – Maximum Segment Size – Window Scale Options » Increases the TCP window from 16 to 32 bits, I.e., the window size is interpreted differently » This option can only be used in the SYN segment (first segment) during connection establishment time – Timestamp Option » Can be used for roundtrip measurements 24 Connection Management in TCP • Opening a TCP Connection • Closing a TCP Connection • State Diagram 25 TCP Connection Establishment • TCP uses a three-way handshake to open a connection: aida.poly.edu mng.poly.edu SYN (Seq N o = x) ckNo = A , y = o N q e S SYN ( (SeqNo = x +1, AckNo x+1) =y+1) 26 A Closer Look with tcpdump aida issues an "telnet mng" aida.poly.edu mng.poly.edu 1 aida.poly.edu.1121 > mng.poly.edu.telnet: S 1031880193:1031880193(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp> 2 mng.poly.edu.telnet > aida.poly.edu.1121: S 172488586:172488586(0) ack 1031880194 win 8760 <mss 1460> 3 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488587 win 17520 4 aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880194:1031880218(24) ack 172488587 win 17520 5 mng.poly.edu.telnet > aida.poly.edu.1121: P 172488587:172488590(3) ack 1031880218 win 8736 6 aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880218:1031880221(3) ack 172488590 win 17520 27 Three-Way Handshake aida.poly.edu mng.poly.edu S 103188 0193:103 1880193( win 16384 0) <mss 146 0, ...> 8586(0) 8 4 2 7 :1 6 8 5 8 8 S 1 724 < mss 1460> 0 6 7 8 in w 4 9 1 ack 1031880 ack 172488 587 win 175 20 28 TCP Connection Termination • Each end of the data flow must be shut down independently (“half-close”) • If one end is done it sends a FIN segment. The other end sends ACK. • Four messages to complete shut down a connection A FIN B ACK B can still send to A FIN ACK 29 Connection termination with tcpdump aida issues an "telnet mng" aida.poly.edu mng.poly.edu 1 mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0) ack 1031880221 win 8733 2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 17484 3 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0) ack 172488735 win 17520 4 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733 30 TCP Connection Termination aida.poly.edu mng.poly.edu F 172488734:172488734(0) ack 1031880221 win 8733 . ack 17 2488735 win 174 84 F 10318 80221:1 0318802 ack 172 21(0) 488735 win 175 20 in 8733 w 2 2 2 0 8 8 1 3 0 . a ck 1 31 TCP state diagram 32 TCP States in “Normal” Connection Lifetime SYN_SENT (active open) SYN (SeqNo = x) No = x + 1 ) k c A , y = o N q SYN (Se LISTEN (passive open) SYN_RCVD (AckNo = y + 1 ) ESTABLISHED ESTABLISHED FIN_WAIT_1 (active close) FIN_WAIT_2 TIME_WAIT FIN (SeqNo = m) (AckNo = m+ 1 ) CLOSE_WAIT (passive close) FIN (SeqNo = n ) (AckNo = LAST_ACK n+1) CLOSED 33 2MSL Wait State 2MSL Wait State = TIME_WAIT • When TCP does an active close, and sends the final ACK, the connection must stay in in the TIME_WAIT state for twice the maximum segment lifetime. 2MSL= 2 * Maximum Segment Lifetime A FIN ACK FIN XACK B • Why? • TCP is given a chance to resent the final ACK. (Server will timeout after sending the FIN segment and resend the FIN) • The MSL is set to 2 minutes or 1 minute or 30 seconds. 34 Resetting Connections • Resetting connections is done by setting the RST flag • When is the RST flag set? – Connection request arrives and no server process is waiting on the destination port – Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment 35 TCP: Delayed ACKs and Nagle’s algorithm 36 Interactive and bulk data transfer TCP applications can be put into the following categories bulk data transfer - ftp, mail, http interactive data transfer - telnet, rlogin TCP has heuristics to deal these application types. For interactive data transfer: • Try to reduce the number of packets For bulk data transfer: • High throughput 37 Telnet session on a local network Telnet session from Argon to Neon Argon.cs.virginia.edu Neon.cs.virginia.edu • This is the output of typing 3 (three) characters : Time 44.062449: Argon Neon: Time 44.063317: Neon Argon: Time 44.182705: Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Push, SeqNo 1:2(1), AckNo 1 No Data, AckNo 2 Time 48.946471: Argon Neon: Time 48.947326: Neon Argon: Time 48.982786: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Push, SeqNo 2:3(1), AckNo 2 No Data, AckNo 3 Time 55.116581: Argon Neon: Time 55.117497: Neon Argon: Time 55.183694: Argon Neon: Push, SeqNo 2:3(1) AckNo 3 Push, SeqNo 3:4(1) AckNo 3 No Data, AckNo 4 38 Interactive applications: Telnet • Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client. • For each character typed, you see three packets: 1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and acknowledgement for first packet 3. Client Server: Acknowledgement for second packet 39 Why 3 packets per character? • We would expect four packets per character: character r cte ACK of chara cter echo of chara ACK of echoed character character • However, tcpdump shows this pattern: ACK and echo of character ACK of echoed character • What has happened? TCP has delayed the transmission of an ACK 40 Delayed Acknowledgement • TCP delays transmission of ACKs for up to 200ms • The hope is to have data ready in that time frame. Then, the ACK can be piggybacked with a data segment. • Delayed ACKs explain why the ACK and the “echo of character” are sent in the same segment. 41 Telnet session to a distant host Telnet session between argon.cs.virginia.edu and tenet.cs.berkeley.edu argon.cs.virginia.edu tenet.cs.berkeley.edu • This is the output of typing nine characters : Time 16.401963: Time 16.481929: Argon Tenet: Tenet Argon: Push, SeqNo 1:2(1), AckNo 2 Push, SeqNo 2:3(1) , AckNo 2 Time 16.482154: Time 16.559447: Argon Tenet: Tenet Argon: Push, SeqNo 2:3(1) , AckNo 3 Push, SeqNo 3:4(1), AckNo 3 Time 16.559684: Time 16.640508: Argon Tenet: Tenet Argon: Push, SeqNo 3:4(1), AckNo 4 Push, SeqNo 4:5(1) AckNo 4 Time 16.640761: Time 16.728402: Argon Tenet: Tenet Argon: Push, SeqNo 4:8(4) AckNo 5 Push, SeqNo 5:9(4) AckNo 8 42 Delayed Acks do not kick in if there are data to send • Observation: Transmission of segments follows a different pattern, i.e., there are only two packets per character typed char1 r1 + echo of cha ACK of char 1 ACK + char2 f ACK + echo o char2 • The delayed acknowledgment does not kick in • The reason is that there is always data at Argon ready to sent when the ACK arrives. 43 Nagle’s Algorithm • Observation: – Argon never has multiple unacknowledged segments outstanding – There are fewer transmissions than there are characters. • Sending one byte per packet is inefficient. • Solution: Nagle’s Algorithm Small segments cannot be sent until outstanding data is acked. • The algorithm can be disabled, because it could be a problem to interactive applications such as X window. 44 TCP: Flow Control Congestion Control 45 What is Flow/Congestion Control ? • Flow Control: Algorithms to prevent that the sender overruns the receiver buffer • Congestion Control: Algorithms to prevent that the sender overloads the network Sliding window implements both control mechanisms. 46 TCP Flow Control 47 TCP Flow Control • TCP implements sliding window flow control • Sending acknowledgements is separated from setting the window size at sender. •Acknowledgements do not automatically increase the window size • Acknowledgements are cumulative 48 Sliding Window Flow Control • Sliding Window Protocol is performed at the byte level: Advertised window 1 2 sent and acknowledged 3 4 5 sent but not acknowledged 6 7 8 can be sent USABLE WINDOW 9 10 11 can't sent •Here: Sender can transmit sequence numbers 6,7,8. 49 Sliding Window: “Window Opens” • Acknowledgement is received that enlarges the window to the right (AckNo = 5, Win=6): •1 •2 •3 •4 •5 •6 •7 •8 •9 •10 •11 •AckNo = 5, Win = 6 •is received •1 •2 •3 •4 •5 •6 •7 •8 •9 •10 •11 • A receiver opens a window when TCP buffer empties (meaning that data is delivered to the application). 50 Window Management in TCP • The receiver is returning two parameters to the sender AckNo 32 bits window size (win) 16 bits • The interpretation is: • I am ready to receive new data with SeqNo= AckNo, AckNo+1, …., AckNo+Win-1 • Receiver can acknowledge data without opening the window • Receiver can change the window size without acknowledging data 51 Sliding Window: Example Receiver Buffer Sender sends 2K of data 0 4K 2K SeqNo=0 2K Sender blocked Sender sends 2K of data Win=2048 AckNo=2048 2K SeqNo=2 048 4K Win=0 AckNo=4096 3K Win=1024 AckNo=4096 52 TCP Congestion Control 53 TCP Congestion Control • Keep a sender from congesting the network. • The sender has two internal parameters: – Congestion Window (cwnd) – Slow-start threshhold Value (ssthresh) • Sliding window size is set to the minimum of (cwnd, receiver advertised win) • Congestion control works in two modes: – slow start (cwnd < ssthresh) • Probe the available bandwidth – congestion avoidance (cwnd >= ssthresh) • Try not to overload the network. 54 Slow Start • Initial value: Set cwnd = 1 • Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size) • Modern TCP implementation may set initial cwnd to 2 • Each time an ACK is received by the sender, the congestion window is increased by 1 segment: cwnd = cwnd + 1 • If an ACK acknowledges two segments, cwnd is still increased by only 1 segment. • Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1. • Question: how can you accelerate your TCP download? 55 Slow Start Example • The congestion window size grows very rapidly – For every ACK, we increase cwnd by 1 irrespective of the number of segments ACK’ed • TCP slows down the increase of cwnd when cwnd > ssthresh cwnd = 1 segment 1 t1 ACK for segmen cwnd = 2 cwnd = 4 segment 2 segment 3 ts 2 ACK for segmen ts 3 ACK for segmen segment 4 segment 5 segment 6 ts 4 ACK for segmen ts 5 ACK for segmen ts 6 ACK for segmen cwnd = 7 56 Congestion Avoidance • Congestion avoidance phase is started if cwnd has reached the slow-start threshold value • If cwnd >= ssthresh then each time an ACK is received, increment cwnd as follows: • cwnd = cwnd + 1/ cwnd • So cwnd is increased by one only if all cwnd segments have been acknowledged. 57 Example of Slow Start/Congestion Avoidance Assume that ssthresh = 8 cwnd = 1 cwnd = 2 cwnd = 4 12 10 cwnd = 8 ssthresh 8 6 4 2 cwnd = 9 6 t= 4 t= 2 t= 0 0 t= Cwnd (in segments) 14 Roundtrip times cwnd = 10 58 Responses to Congestion • TCP uses packet loss as congestion signal • A TCP sender can detect lost packets via: • Receipt of a duplicate ACK • Timeout of a retransmission timer 59 Response to Timeout • TCP interprets a Timeout as a severe congestion signal. When a timeout occurs, the sender performs: – cwnd is reset to one: cwnd = 1 – ssthresh is set to half of the current size of the congestion window: ssthressh = cwnd / 2 – and slow-start is entered 60 Reaction to Duplicate ACKs • Fast retransmit – Three duplicate ACKs indicate a packet loss – Retransmit without timeout • Fast recovery – Avoid slow start – Retransmit “lost packet” – ssthresh = cwnd/2 – cwnd = cwnd+3 – Increment cwnd by one for each additional duplicate ACK • When ACK arrives that acknowledges “new data” set: cwnd=ssthresh enter congestion avoidance 61 Duplicate ACK example 1K SeqNo=0 AckNo=1024 1K SeqNo=1 024 1K SeqNo=2 048 1. duplicate AckNo=1024 1K SeqNo=3 072 2. duplicate AckNo=1024 1K SeqNo=4 096 3. duplicate AckNo=1024 1K SeqNo=1 024 1K SeqNo=5 120 62 Flavors of TCP Congestion Control • TCP Tahoe (1988, FreeBSD 4.3 Tahoe) – Slow Start – Congestion Avoidance – Fast Retransmit • TCP Reno (1990, FreeBSD 4.3 Reno) – Fast Recovery – Modern TCP implementation • New Reno (1996) • SACK (1996) 63 This picture is copied from somewhere TCP Tahoe 64 SS This picture is copied from somewhere TCP Reno (Jacobson 1990) CA Fast retransmission/fast recovery 65 TCP III – Retransmission and Timeout 66 Retransmissions in TCP • A TCP sender retransmits a segment when it assumes that the segment has been lost: 1. No ACK has been received and a timeout occurs 2. Multiple ACKs have been received for the same segment 67 Retransmission Timer • • TCP sender maintains one retransmission timer for each connection When the timer reaches the retransmission timeout (RTO) value, the sender retransmits the first segment that has not been acknowledged • The timer is started when 1. When a packet with payload is transmitted and timer is not running 2. When an ACK arrives that acknowledges new data, 3. When a segment is retransmitted • The timer is stopped when – All segments are acknowledged 68 How to set the timer • Retransmission Timer: – The setting of the retransmission timer is crucial for good performance of TCP – Timeout value too small results in unnecessary retransmissions – Timeout value too large long waiting time before a retransmission can be issued – A problem is that the delays in the network are not fixed – Therefore, the retransmission timers must be adaptive 69 Setting the value of RTO: • The RTO value is set based on round-trip time (RTT) measurements that each TCP performs t1 Segment 2 Segment 3 egm ACK for S Segment RTT #3 • Figure on the right shows three RTT measurements en ACK for Segm RTT #2 • There is only one measurement ongoing at any time (i.e., measurements do not overlap) Segment 1 RTT #1 • Each TCP connection measures the time difference between the transmission of a segment and the receipt of the corresponding ACK egm ACK for S ent 2 + 3 Segme 5 nt 4 ent 4 egment 5 ACK for S 70 Setting the RTO value • RTO is calculated based on the RTT measurements – Uses an exponential moving average to estimate RTT (srtt) and variance of RTT (rttvar) from – The influence of past samples decrease exponentially • The RTT measurements are smoothed by the following estimators srtt and rttvar: srttn+1 = a RTT + (1- a ) srttn rttvarn+1 = b ( | RTT - srttn | ) + (1- b ) rttvarn RTOn+1 = srttn+1 + 4 rttvarn+1 – The gains are set to a =1/4 and b =1/8 71 Setting the RTO value (cont’d) • Initial value for RTO: – Sender should set the initial value of RTO to RTO0 = 3 seconds • RTO calculation after first RTT measurements arrived srtt1 = RTT rttvar1 = RTT / 2 RTO1 = srtt1 + 4 rttvarn+1 • When a timeout occurs , the RTO value is doubled RTOn+1 = max ( 2 RTOn, 64) seconds This is called an exponential backoff 72 Karn’s Algorithm nt Timeout ! RTT ? RTT measurements is ambiguous in this case segme RTT ? If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. retransm ission of segm ent ACK Karn’s Algorithm: • Don’t update RTT on any segments that have been retransmitted 73 Summary • UDP: connectionless, unreliable, datagram service • TCP: reliable, connection-oriented, byte stream service – TCP header – Connection management – Delayed ACKs and nagle’s algorithm – TCP flow control – TCP congestion control – TCP retransmission and timeout • References – TCP/IP illustrated vol. 1, chapter11, 17-24 – RFC793 (Transmission Control Protocol) – RFC768 (User Datagram Protocol) – RFC2581 (TCP Congestion control) – RFC2988 (Computing TCP’s Retransmission Timer) – RFC3390 (Increasing TCP’s Initial Window) 74