* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download TamVu_TCP_lec_DR13 - Winlab
Airborne Networking wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Multiprotocol Label Switching wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Serial digital interface wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Deep packet inspection wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Internet protocol suite wikipedia , lookup
Real-Time Messaging Protocol wikipedia , lookup
Transport Layer ECE544: Communication Networks-II, Spring 2013 Tam Vu WINLAB, Dept. of Computer Science Rutgers University Includes teaching materials from, L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mike Freedman IP Protocol Stack: Key Abstractions Application Transport Network Link Applications Reliable streams Messages Best-effort global packet delivery Best-effort local packet delivery  Problem: Network Layer (IP) provides only best-effort communication services 2 Applications requirements vs. IP layer limitations  Guarantee message delivery  Network may drop messages.  Deliver messages in the same order they are sent  Messages may be reordered in networks and incurs a long delay.  Delivers at most one copy of each message  Messages may duplicate in networks.  Support arbitrarily large message  Network may limit message size.  Support synchronization between sender and receiver  Allows the receiver to apply flow control to the sender  Support multiple application processes on each host  Network only support communication between hosts  Many more IP Protocol Stack: Key Abstractions Application Transport Network Link Applications Reliable streams Messages Best-effort global packet delivery Best-effort local packet delivery  Transport layer:  Provide applications with good abstractions  Without support or feedback from the network  Is the lowest layer in the network stack that is an end-to-end protocol 4 Transport Protocols  Logical communication between processes  Sender divides a message into segments  Receiver reassembles segments into message  Transport services  (De)multiplexing packets  Detecting corrupted data  Optionally: reliable delivery, flow control, … 5 Two Basic Transport Features  Demultiplexing: port numbers Server host 128.2.194.242 Client host Client Service request for 128.2.194.242:80 (i.e., the Web server) Web server (port 80) OS Echo server (port 7)  Error detection: checksums IP payload detect corruption 6 Most Popular Transport Protocols  User Datagram Protocol (UDP)  Support multiple applications processes on each host  Option to check messages for correctness with CRC check  Transmission Control Protocol (TCP)  Ensures reliable delivery of packets between source and destination processes  Ensures in-order delivery of packets to destination process  Other services  Real Time Protocol (RTP)  Serves real-time multimedia applications  Moves decision making to the applications  Runs over UDP User Datagram Protocol (UDP)  Service: Support for multiple processes on each host to communicate  Issue: IP only provides communication between hosts (IP addresses)  Solution  Add port number and associate a process with a port number  4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort, DestIPAddr ]  Lightweight communication between processes  Send and receive messages 0  Avoid overhead of ordered, reliable delivery  No connection setup delay, in-kernel connection state  Used by popular apps  Query/response for DNS  Real-time data in VoIP 16 SrcPort DesPort Length Checksum Payload 31 User Datagram Protocol (UDP): Error Detection  Service: Ensure message correctness  Issue: Packet corruption in transit  Solution  Use Checksum.  Includes UDP header, payload, pseudo header  Pseudo header  Protocol number, source IP address, destination IP address, and UDP length 0 16 SrcPort DesPort Length Checksum Payload 31 Transmitting a stream of bytes ?  Stream-of-bytes service  Sends and receives a stream of bytes  Reliable, in-order delivery  Corruption: checksums  Detect loss/reordering: sequence numbers  Reliable delivery: acknowledgments and retransmissions 11  Connection oriented  Explicit set-up and tear- down of TCP connection  Flow control  Prevent overflow of the receiver’s buffer space  Congestion control  Adapt to network congestion for the greater good Transmission Control Protocol (TCP)  First proposed by Vinton Cerf and Robert Kahn, 1974  TCP/IP enabled computers of all sizes, from different vendors, different OSs, to communicate with each other.  Used by 80% of all traffic on the Internet  Reliable, in-order delivery, connection-oriented, bye-stream service Starting and Ending a Connection: TCP Handshakes Establishing a TCP Connection A B Each host tells its Initial Sequence Number (ISN) to the other host.  Three-way handshake to establish connection  Host A sends a SYN (open) to the host B  Host B returns a SYN acknowledgment (SYN ACK)  Host A sends an ACK to acknowledge the SYN ACK 14 TCP Header Source port Flags: SYN FIN RST PSH URG ACK Sequence number Acknowledgment HdrLe 0 Flags Advertised window n Checksum Urgent pointer Options (variable) Data 15 Destination port Step 1: A’s Initial SYN Packet A’s port Flags: SYN FIN RST PSH URG ACK B’s port A’s Initial Sequence Number Acknowledgment 20 0 Flags Advertised window Checksum Urgent pointer Options (variable) A tells B it wants to open a connection… 16 Step 2: B’s SYN-ACK Packet B’s port Flags: SYN FIN RST PSH URG ACK A’s port B’s Initial Sequence Number A’s ISN plus 1 20 0 Flags Advertised window Checksum Urgent pointer Options (variable) B tells A it accepts, and is ready to hear the next byte… … upon receiving this packet, A can start sending data 17 Step 3: A’s ACK of the SYN-ACK A’s port Flags: SYN FIN RST PSH URG ACK B’s port Sequence number B’s ISN plus 1 20 0 Flags Advertised window Checksum Urgent pointer Options (variable) A tells B it is okay to start sending … upon receiving this packet, B can start sending data 18 SYN Loss and Web Downloads  Upon sending SYN, sender sets a timer  If SYN lost, timer expires before SYN-ACK received  Sender retransmits SYN  How should the TCP sender set the timer?  No idea how far away the receiver is  Some TCPs use default of 3 or 6 seconds  Implications for web download  User gets impatient and hits reload  … Users aborts connection, initiates new socket  Essentially, forces a fast send of a new SYN! 19 Tearing Down the Connection B A time  Closing (each end of) the connection  Finish (FIN) to close and receive remaining bytes  And other host sends a FIN ACK to acknowledge  Reset (RST) to close and not receive remaining bytes 20 Sending/Receiving the FIN Packet  Sending a FIN: close()  Process is done sending data via socket  Process invokes “close()”  Once TCP has sent all the  Receiving a FIN: EOF  Process is reading data from socket  Eventually, read call returns an EOF outstanding bytes…  … then TCP sends a FIN 21 Data transmission TCP: Byte-stream  Service: Byte-stream  Application reads or writes a stream of bytes to the transport  Issue: IP is packet-oriented  Solution: TCP maintains a local buffer  Chop the stream into packets and transmit (sender)  Coalesce data from packets to form a stream (receiver) TCP “Stream of Bytes” Service Host A Host B 24 …Emulated Using TCP “Segments” Host A Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. “Pushed” by application TCP Data Host B TCP Data 25 TCP Segment  IP packet IP Data TCP Data (segment) TCP Hdr IP Hdr  No bigger than Maximum Transmission Unit (MTU)  E.g., up to 1500 bytes on an Ethernet link  TCP packet  IP packet with a TCP header and data inside  TCP header is typically 20 bytes long  TCP segment  No more than Maximum Segment Size (MSS) bytes  E.g., up to 1460 consecutive bytes from the stream 26 Sequence Number Host A ISN (initial sequence number) Sequence number = 1st byte Host B TCP Data TCP Data 27 Reliable Delivery on a Lossy Channel With Bit Errors Challenges of Reliable Data Transfer  Over a perfectly reliable channel: Done  Over a channel with bit errors  Receiver detects errors and requests retransmission  Over a lossy channel with bit errors  Some data missing, others corrupted  Receiver cannot easily detect loss  Over a channel that may reorder packets  Receiver cannot easily distinguish loss vs. out-of-order 30 An Analogy  Alice and Bob are talking  What if Alice couldn’t understand Bob?  Bob asks Alice to repeat what she said  What if Bob hasn’t heard Alice for a while?  Is Alice just being quiet? Has she lost reception?  How long should Bob just keep on talking?  Maybe Alice should periodically say “uh huh”  … or Bob should ask “Can you hear me now?” 31 Take-Aways from the Example  Acknowledgments from receiver  Positive: “okay” or “uh huh” or “ACK”  Negative: “please repeat that” or “NACK”  Retransmission by the sender  After not receiving an “ACK”  After receiving a “NACK”  Timeout by the sender (“stop and wait”)  Don’t wait forever without some acknowledgment 32 TCP Support for Reliable Delivery  Detect bit errors: checksum    Detect missing data: sequence number    Used to detect corrupted data at the receiver …leading the receiver to drop the packet Used to detect a gap in the stream of bytes ... and for putting the data back in order Recover from lost data: retransmission   Sender retransmits lost or corrupted data Two main ways to detect lost packets 33 TCP Acknowledgments Host A ISN (initial sequence number) Sequence number = 1st byte Host B TCP Data ACK sequence number = next expected byte TCP Data 34 Automatic Repeat reQuest (ARQ)  Simplest ARQ protocol  Stop and wait  Send a packet, stop and wait until ACK arrives 35 Sender Timeou t  ACK and timeouts  Receiver sends ACK when it receives packet  Sender waits for ACK and times out Time Receiver Quick TCP Math • Initial Seq No = 501. Sender sends 4500 bytes successfully acknowledged. Next sequence number to send is: (A) 4501 (B) 5000 (C) 5001 (D) 5002 • Next 1000 byte TCP segment received. Receiver acknowledges with ACK number: (A) 5001 (B) 6000 36 (C) 6001 Flow Control: TCP Sliding Window Sliding Window: Motivation  Stop-and-wait is inefficient  Only one TCP segment is “in flight” at a time  Consider: 1.5 Mbps link with 50 ms round-trip-time (RTT)  Assume segment size of 1 KB (8 Kbits)  8 Kbits/segment at 50 msec/segment  160 Kbps  That’s 11% of the capacity of 1.5 Mbps link 39 Sliding Window  Allow a larger amount of data “in flight”  Allow sender to get ahead of the receiver  … though not too far ahead Sending process TCP Last byte written Last byte ACKed Last byte sent Receiving process TCP Last byte read Next byte expected Last byte received 40 Receiver Buffering  Receive window size  Amount that can be sent without acknowledgment  Receiver must be able to store this amount of data  Receiver tells the sender the window  Tells the sender the amount of free space left Window Size Data ACK’dOutstanding Data OK Data not OK Un-ack’d data to send to send yet 41 TCP: Flow Control  Flow Control  “Prevent sender from overrunning the capacity (buffer) of the receiver”  Solution: Use adaptive receiver window size  Goal is to keep (C) – (A) < MaxRcvBuffer  Every packet carries ACK and AdvertisedWindow Receiving Appl Sending Appl TCP LastByteAcked (J) (I) LastByteWritten (K) LastByteSent LastByteSent (K) – LastByteAcked (J) <= AdvertisedWindow EffWin = AdvertisedWin (LastByteSent-LastByteAcked) LastByteWritten – LastByteAcked <= MaxSendBuffer LastByteRead (A) (B) NextByteExpected TCP (C) LastByteRcvd AdvertisedWindow = MaxRcvBuffer((NextByteExp-1)-LastByteRead) Optimizing Retransmissions 43 44 Timeou t Timeou t Timeou t Packet lost Timeou t Timeou t Timeou t Reasons for Retransmission ACK lost DUPLICATE PACKET Early timeout DUPLICATE PACKETS How Long Should Sender Wait?  Sender sets a timeout to wait for an ACK  Too short: wasted retransmissions  Too long: excessive delays when packet lost  TCP sets timeout as a function of the RTT  Expect ACK to arrive after an “round-trip time”  … plus a fudge factor to account for queuing  But, how does the sender know the RTT?  Running average of delay to receive an ACK 45 TCP Timeout  Issue: RTT in a wide area network varies substantially  Solution: Adaptive Timeout  Original Algorithm:  EstimatedRTT = a x EstimatedRTT + (1-a) x SampleRTT 0 a 1  Timeout = β x EstimatedRTT (β = 2)  Problem  Does not distinguish whether the ACK is for original transmission or retransmission  Constant β is not good.  Assumes constant variance TCP Timeout  Karn/Partridge Algorithm  Whenever TCP retransmits a segment, it stops taking samples of the RTT  Only measure SampleRTT for segments that have been sent only once  Each time TCP retransmits, set the next timeout to be twice the last timeout  Relieves congestion  Jacobson/Karels Algorithm: Adaptive variance (uses mean variance) Difference = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (d x Difference) → (same as in original) Deviation = Deviation + d(|Difference|- Deviation) Timeout = m x EstimatedRTT + f x Deviation (default: set m = 1 and f= 4 ) 0  d 1 TCP Deadlock  TCP Deadlock  receiver advertises a window size of 0, the sender stops sending data  the window size update from the receiver is lost  To solve it:  the sender starts the persist timer when AdvertisedWindow = 0  When the persist timer expires, the sender sends a small packet Triggering Transmission  When to transmit a segment:  small segments subject to large overhead  Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment  MSS = local MTU – IP & TCP header  The sending process explicitly ask the TCP to transmit, “push” Congestion Source 1 Even with flow control packets might not reach the destination Dest 1 Source 2 Source 3 Dest 2  When the network cannot support the sender’s rate  Queues at the network elements overflow Congestion Control vs. Flow Control  Congestion Control  Mechanism to prevent sender from overrunning the capacity of the network  When network is the bottleneck  Flow Control  Mechanism to prevent sender from overrunning the capacity of the receiver  When receiver is the bottleneck Congestion Control: Design Approach  Maintain another window at the sender called CongestionWindow (cwnd)  CongestionWindow is the max number of packets allowed in the network  Number of unACKed packets at the sender.  Key: How to calculate congestion window (cwnd)  Various approaches possible  TCP estimates it based on observed packet losses  Assumes packet loss as indication of congestion  Since we don’t know whether the network or the receiver is the bottleneck  MaxWindow = MIN(CongestionWindow, AdvertisedWindow)  EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked) Congestion Avoidance: (AIMD)  If no congestion in the network (increase conservatively)  Increase the congestion window additively every RTT Every RTT w=w+1 w = cwnd in segments Every ACK reception w = w + 1/w w = cwnd in segments Every ACK reception cwnd = cwnd + MSS*(MSS/cwnd) cwnd in bytes  If congestion in the network (decrease aggressively)  Decrease the congestion window multiplicatively, immediately cwnd = cwnd/2 cwnd in bytes  How is congestion detected?  Estimated (more later) CongestionWindow Size Congestion Avoidance: (AIMD) Startup time Time  TCP’s saw tooth pattern  Issues with additive increase  takes too long to ramp up a connection from the beginning  The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender TCP “Slow Start”: To start quickly!  Maintain another variable slow start threshold (ssthresh)  Last known stable rate  If (cwnd > ssthresh)  State = congestion avoidance  Else  State = slow start  In Slow start  Increase the congestion window exponentially every RTT Every ACK reception w=w+1 w = cwnd in segments Every ACK reception cwnd = cwnd + MSS cwnd in bytes  Key: How is ssthresh calculated? TCP: Congestion Detection and Retransmit  Loss of packet indicates congestion  Timer Timeouts (No ACK)  Set according to Jacobson/Karels algorithm  On timer timeout  ssthresh = max(2*MSS, effwin/2); cwnd = MSS  Notice this will cause TCP to go into slow start  Issue: takes a long time to detect a packet loss  Affects throughput  Any other quicker way of detecting a packet loss? Fast Retransmit  Observation: A series of duplicate ACKs might mean a packet loss  Solution  Every time receiver receives a packet (out-of-order), sends a duplicate ACK  Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs) PKT 1 PKT 2 PKT 3 PKT 4 PKT 5 PKT 6 ACK 1 ACK 2 ACK 2 ACK 2 ACK 2  Fast Retransmit does not replace timeouts  Issue: Reduces latency (early retransmit) but still incurs loss in throughput (slow start after packet loss ) PKT 3 Retran ACK 6 Fast Recovery  Transmit a packet for every ACK received till the retransmitted packet is ACK’d  ssthresh= (2*MSS, cwdn/2); cwnd = sshthred + 3  On every ACK will the ACK of retransmitted packet  cwnd = cwnd + 1  On reception of ACK of retransmitted packet  Start congestion avoidance instead of slow start  cwnd = ssthresh Homework  5.13 (3rd ed and 4th ed)  5.16  5.28  5.34  5.39 Due 4/5
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            