* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Parallel port wikipedia , lookup
Low-voltage differential signaling wikipedia , lookup
Deep packet inspection wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Internet protocol suite wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
TCP congestion control wikipedia , lookup
Part 3: Transport Layer CSE 3461/5461 Reading: Chapter 3, Kurose and Ross 1 Part 3: Transport Layer Chapter goals: Chapter Overview: • Understand principles behind • Transport layer services transport layer services: • Multiplexing/demultiplexing – Multiplexing/demultiplexing • Connectionless transport: UDP – Reliable data transfer • Principles of reliable data transfer – Flow control • Connection-oriented transport: TCP – Congestion control • Instantiation and implementation in the Internet – Reliable transfer – Flow control – Connection management • Principles of congestion control • TCP congestion control 2 Transport Services and Protocols • Provide logical communication between application processes running on different hosts • Transport protocols run in end systems • Transport layer vs. network layer services: – Network layer: data transfer between end systems – Transport layer: data transfer between processes; relies on, enhances, network layer services application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 3 Transport-Layer Protocols Internet transport services: • Reliable, in-order unicast delivery (TCP) – Congestion – Flow control – Connection setup • Unreliable (“best-effort”), unordered unicast or multicast delivery: UDP • Services not available: – Real-time – Bandwidth guarantees – Reliable multicast application transport network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical 4 Multiplexing/Demultiplexing (1) Recall: Segment is a unit of data exchanged between transport layer entities – Aka TPDU: transport protocol data unit Application-layer data Segment header Segment P1 M Ht M H n Segment Application Transport Network Demultiplexing: Delivering received segments to correct app layer processes Receiver P3 M M Application Transport Network P4 M P2 Application Transport Network 5 Multiplexing/Demultiplexing (2) Multiplexing: Gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) Multiplexing/demultiplexing: • Based on sender, receiver port numbers, IP addresses – Source, destination port numbers in each segment – Recall: Well-known port numbers for specific applications • Internal action between layers: in the network, messages from different applications will be in different packets. 32 bits Source port # Dest port # Other header fields Application data (message) TCP/UDP segment format 6 Multiplexing/Demultiplexing: Examples Host A Source Port: X Dest. Port: 23 Server B Source Port:23 Dest. Port: X Source IP: C Dest IP: B Source Port: Y Dest. Port: 80 Port Use: Simple Telnet App Web Client Host A Web Client Host C Source IP: A Dest IP: B Source Port: X Dest. Port: 80 Source IP: C Dest IP: B Source Port: X Dest. Port: 80 Web Server B Port Use: Web Server 7 UDP: User Datagram Protocol [RFC 768] • “No frills,” “bare bones” Internet transport protocol • “Best effort” service, UDP segments may be: – Lost – Delivered out of order to app • Connectionless: – No handshaking between UDP sender, receiver – Each UDP segment handled independently of others Why is there a UDP? • No connection establishment (which can add delay) • Simple: no connection state at sender, receiver • Small segment header • No congestion control: UDP can blast away as fast as desired 8 UDP: More • Often used for streaming multimedia apps – Loss tolerant – Rate sensitive • Other UDP uses Length, in bytes of UDP segment, (why?): including header – DNS – SNMP • Reliable transfer over UDP: add reliability at application layer – Application-specific error recovery! 32 Bits Source Port # Dest Port # Length Checksum Application data (message) UDP Segment Format 9 UDP Checksum Goal: Detect “errors” (e.g., flipped bits) in transmitted segment Sender: Receiver: • Treat segment contents as sequence of 16-bit integers • Checksum: addition (one’s complement sum) of segment contents • Sender puts checksum value into UDP checksum field • Compute checksum of received segment • Check if computed checksum equals checksum field value: – NO: Error detected – YES: No error detected. But maybe errors nonetheless? More later …. 10 Principles of Reliable Data Transfer (1) • Important in application, transport, link layers – Top-10 list of important networking topics! • Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (RDT) 11 Principles of Reliable Data Transfer (2) • Important in application, transport, link layers – Top-10 list of important networking topics! • Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (RDT) 12 Principles Of Reliable Data Transfer (3) • Important in application, transport, link layers – Top-10 list of important networking topics! • Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (RDT) 13 Reliable Data Transfer: Getting Started (1) rdt_send(): Called from above, (e.g., by app.). Passed data to deliver to receiver upper layer send side udt_send(): Called by RDT, to transfer packet over unreliable channel to receiver deliver_data(): Called by RDT to deliver data to upper layer receive side rdt_rcv(): Called when packet arrives on rcv-side of channel 14 Reliable Data Transfer: Getting Started (2) We will: • Incrementally develop sender, receiver sides of Reliable Data Transfer protocol (RDT) • Consider only unidirectional data transfer – But control info will flow on both directions! • Use finite state machines (FSM) to specify sender, Event causing state transition receiver Actions taken on state transition State: when in this “state” next state uniquely determined by next event state 1 Event Actions state 2 15 RDT1.0: Reliable Transfer over Reliable Channel • Underlying channel perfectly reliable – No bit errors – No loss of packets • Separate FSMs for sender, receiver: – Sender sends data into underlying channel – Receiver reads data from underlying channel Wait for call from above rdt_send(data) packet = make_pkt(data) udt_send(packet) Sender Wait for call from below rdt_rcv(packet) extract (packet,data) deliver_data(data) Receiver 16 RDT2.0: Channel with Bit Errors (1) • Underlying channel may flip bits in packet – Checksum to detect bit errors • The question: how to recover from errors: – acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK – negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors – sender retransmits pkt on receipt of NAK How do humans recover from “errors” • new mechanisms in rdt2.0 (beyond rdt1.0): during conversation? – error detection – receiver feedback: control msgs (ACK,NAK) rcvr>sender 17 RDT2.0: Channel with Bit Errors (2) • Underlying channel may flip bits in packet – Checksum to detect bit errors • The question: how to recover from errors: – Acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK – Negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors – Sender retransmits pkt on receipt of NAK • New mechanisms in RDT2.0 (beyond RDT1.0): – Error detection – Feedback: control msgs (ACK,NAK) from receiver to sender 18 RDT2.0: FSM Specification rdt_send(data) sndpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call ACK or udt_send(sndpkt) from above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt) L Sender Receiver rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 19 RDT2.0: Operation with no Errors rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call ACK or udt_send(sndpkt) from above NAK rdt_rcv(rcvpkt) && isACK(rcvpkt) L rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 20 RDT2.0: Error Scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && isNAK(rcvpkt) Wait for Wait for call ACK or udt_send(sndpkt) from above NAK rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) rdt_rcv(rcvpkt) && isACK(rcvpkt) Λ Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt,data) deliver_data(data) udt_send(ACK) 21 RDT2.0 has a Fatal Flaw! What happens if ACK/NAK corrupted? • Sender doesn’t know what happened at receiver! • Can’t just retransmit: possible duplicate Handling duplicates: • Sender retransmits current pkt if ACK/NAK corrupted • Sender adds sequence number to each pkt • Receiver discards (doesn’t deliver up) duplicate pkt Stop and Wait Sender sends one packet, then waits for receiver response 22 RDT2.1: Sender, Handles Garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt)rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for call isNAK(rcvpkt) ) ACK or 0 from udt_send(sndpkt) NAK 0 above rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt) L rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isNAK(rcvpkt) ) udt_send(sndpkt) L Wait for ACK or NAK 1 Wait for call 1 from above rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) 23 RDT2.1: Receiver, Handles Garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq0(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK,chksum) udt_send(sndpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) Wait for 0 from below rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq1(rcvpkt) sndpkt = make_pkt(ACK,chksum) udt_send(sndpkt) Wait for 1 from below rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK,chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq0(rcvpkt) sndpkt = make_pkt(ACK,chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) 24 RDT2.1: Discussion Sender: • Seq # added to pkt • Two seq. numbers (0,1) will suffice. Why? • Must check if received ACK/NAK corrupted • Twice as many states – State must “remember” whether “expected” pkt should have seq # 0 or 1 Receiver: • Must check if received packet is duplicate – State indicates whether 0 or 1 is expected pkt seq # • Note: receiver can not know if its last ACK/NAK received OK at sender 25 RDT2.2: A NAK-Free Protocol • Same functionality as RDT2.1, using ACKs only • Instead of NAK, receiver sends ACK for last pkt received OK – Receiver must explicitly include seq # of pkt being ACKed • Duplicate ACK at sender results in same action as NAK: retransmit current pkt 26 RDT2.2: Sender, Receiver Fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || Wait for Wait for isACK(rcvpkt,1) ) ACK 0 call 0 from udt_send(sndpkt) above Sender FSM fragment rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq1(rcvpkt)) Wait for udt_send(sndpkt) 0 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) L Receiver FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(ACK1, chksum) udt_send(sndpkt) 27 RDT3.0: Channels with Errors and Loss New assumption: Approach: sender waits Underlying channel can “reasonable” amount of also lose packets (data, time for ACK • Retransmits if no ACK ACKs) – Checksum, seq. #, ACKs, retransmissions will help … but not enough received in this time • If pkt (or ACK) just delayed (not lost): – Retransmission will be duplicate, but seq. #’s already handles this – Receiver must specify seq # of pkt being ACKed • Requires countdown timer 28 RDT3.0 Sender rdt_send(data) rdt_rcv(rcvpkt) && sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) || udt_send(sndpkt) isACK(rcvpkt,1) ) start_timer L rdt_rcv(rcvpkt) L Wait for ACK0 Wait for call 0 from above rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) ) L timeout udt_send(sndpkt) start_timer Wait for ACK1 Wait for call 1 from above rdt_rcv(rcvpkt) rdt_send(data) L sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer 29 RDT3.0 in Action (1) Receiver Sender send pkt0 rcv ack0 send pkt1 ack0 rcv pkt0 send ack0 rcv ack0 send pkt1 pkt1 ack1 rcv ack1 send pkt0 send pkt0 pkt0 Receiver Sender rcv pkt1 send ack1 pkt0 ack0 rcv pkt0 send ack0 pkt1 X loss pkt0 ack0 rcv pkt0 send ack0 timeout resend pkt1 pkt1 ack1 rcv ack1 send pkt0 (a) no loss rcv pkt1 send ack1 pkt0 ack0 rcv pkt0 send ack0 (b) packet loss 30 RDT3.0 in Action (2) Receiver Sender Receiver Sender send pkt0 rcv ack0 send pkt1 pkt0 ack0 rcv pkt0 send ack0 X ack0 pkt1 ack1 rcv pkt1 (detect duplicate) send ack1 pkt0 ack0 (c) ACK loss rcv pkt0 send ack0 rcv pkt0 send ack0 pkt1 rcv pkt1 send ack1 rcv pkt1 send ack1 loss rcv ack1 send pkt0 rcv ack0 send pkt1 pkt0 pkt1 ack1 timeout resend pkt1 send pkt0 ack1 timeout resend pkt1 rcv ack1 send pkt0 rcv ack1 send pkt0 pkt1 pkt0 ack1 ack0 pkt0 ack0 rcv pkt1 (detect duplicate) send ack1 rcv pkt0 send ack0 rcv pkt0 (detect duplicate) send ack0 (d) premature timeout/ delayed ACK 31 Performance of RDT3.0 • RDT3.0 is correct, but performance stinks • E.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet: 8000 bits L Dtrans = R = 109 bits/sec = 8 microsecs – Usender: utilization – fraction of time sender busy sending U sender = L/R RTT + L / R = .008 30.008 = 0.00027 – If RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec thruput over 1 Gbps link • Network protocol limits use of physical resources! 32 RDT3.0: Stop-and-Wait Operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK RTT ACK arrives, send next packet, t = RTT + L / R U sender = L/R RTT + L / R = .008 30.008 = 0.00027 33 Pipelined Protocols Pipelining: sender allows multiple, “in-flight”, yetto-be-acknowledged pkts – Range of sequence numbers must be increased – Buffering at sender and/or receiver • Two generic forms of pipelined protocols: Go-Back-N, Selective Repeat 34 Pipelining: Increased Utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK RTT ACK arrives, send next packet, t = RTT + L / R Packet pipelining increases utilization by a factor of 3! U sender = 3L / R RTT + L / R = .0024 30.008 = 0.00081 35 Pipelined Protocols: Overview Go-Back-N: • Sender can have up to N unacked packets in pipeline • Receiver only sends cumulative ACK – Doesn’t ACK packet if there’s a gap • Sender has timer for oldest unACKed packet – When timer expires, retransmit all unACKed packets Selective Repeat: • Sender can have up to N unacked packets in pipeline • Receiver sends individual ACK for each packet • Sender maintains timer for each unACKed packet – When timer expires, retransmit only that unacked packet 36 Go-Back-N: Sender • k-bit seq # in pkt header • “Window” of up to N, consecutive unACKed pkts allowed • ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” – may receive duplicate ACKs (see receiver) • timer for oldest in-flight pkt • timeout(n): retransmit packet n and all higher seq # pkts in window 37 GBN: Sender Extended FSM rdt_send(data) L base=1 nextseqnum=1 if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else refuse_data(data) timeout Wait rdt_rcv(rcvpkt) && corrupt(rcvpkt) start_timer udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) … udt_send(sndpkt[nextseqnum-1]) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stop_timer else start_timer 38 GBN: Receiver Extended FSM default udt_send(sndpkt) L Wait expectedseqnum=1 sndpkt = make_pkt(expectedseqnum, ACK,chksum) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt,expectedseqnum) extract(rcvpkt,data) deliver_data(data) sndpkt = make_pkt(expectedseqnum,ACK,chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in-order seq # – May generate duplicate ACKs – Need only remember expectedseqnum • Out-of-order pkt: – Discard (don’t buffer): no receiver buffering! – Re-ACK pkt with highest in-order seq # 39 GBN in Action Sender window (N=4) 012345678 012345678 012345678 012345678 012345678 012345678 Sender send pkt0 send pkt1 send pkt2 send pkt3 (wait) rcv ack0, send pkt4 rcv ack1, send pkt5 ignore duplicate ACK 012345678 012345678 012345678 012345678 pkt 2 timeout send pkt2 send pkt3 send pkt4 send pkt5 Receiver X loss receive pkt0, send ack0 receive pkt1, send ack1 receive pkt3, discard, (re)send ack1 receive pkt4, discard, (re)send ack1 receive pkt5, discard, (re)send ack1 rcv pkt2, deliver, send ack2 rcv pkt3, deliver, send ack3 rcv pkt4, deliver, send ack4 rcv pkt5, deliver, send ack5 40 Selective Repeat • Receiver individually acknowledges all correctly received pkts – Buffers pkts, as needed, for eventual in-order delivery to upper layer • Sender only resends pkts for which ACK not received – Sender timer for each unACKed pkt • Sender window – N consecutive seq #’s – Limits seq #s of sent, unACKed pkts 41 Selective Repeat: Sender, Receiver Windows 42 Selective Repeat Sender Data from above: • If next available seq # in window, send pkt Timeout(n): • Resend pkt n, restart timer ACK(n) in [sendbase,sendbase+N]: • Mark pkt n as received • If n smallest unACKed pkt, advance window base to next unACKed seq # Receiver Pkt n in [rcvbase, rcvbase+N-1] • send ACK(n) • out-of-order: buffer • in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt Pkt n in [rcvbaseN,rcvbase-1] • ACK(n) Otherwise: • Ignore 43 Selective Repeat in Action Sender window (N=4) 012345678 012345678 012345678 012345678 012345678 012345678 Sender send pkt0 send pkt1 send pkt2 send pkt3 (wait) Receiver X loss rcv ack0, send pkt4 rcv ack1, send pkt5 record ack3 arrived 012345678 012345678 012345678 012345678 pkt 2 timeout send pkt2 record ack4 arrived record ack4 arrived receive pkt0, send ack0 receive pkt1, send ack1 receive pkt3, buffer, send ack3 receive pkt4, buffer, send ack4 receive pkt5, buffer, send ack5 rcv pkt2; deliver pkt2, pkt3, pkt4, pkt5; send ack2 Q: what happens when ack2 arrives? 44 Selective Repeat: Dilemma Example: • Seq #s: 0, 1, 2, 3 • Window size = 3 • Receiver sees no difference in two scenarios! • Duplicate data accepted as new in (b) Q: What relationship between seq # size and window size to avoid problem in (b)? Receiver window (after receipt) Sender window (after receipt) 0123012 pkt0 0123012 pkt1 0123012 0123012 pkt2 0123012 0123012 pkt3 0123012 0123012 pkt0 (a) no problem X Will accept packet with seq number 0 Receiver can’t see sender side. Receiver behavior identical in both cases! Something’s (very) wrong! 0123012 pkt0 0123012 pkt1 0123012 0123012 pkt2 0123012 timeout retransmit pkt0 0123012 (b) oops! X X X 0123012 pkt0 Will accept packet with seq number 0 45 TCP: Overview • Point-to-point: RFCs: 793, 1122, 1323, 2018, 2581 • Full duplex data: – One sender, one receiver – Bi-directional data flow in same connection – MSS: Maximum segment size • Reliable, in-order byte steam: – No “message boundaries” • Pipelined: • Connection-oriented: – Handshaking (exchange of control msgs) inits sender, receiver state before data exchange – TCP congestion and flow control set window size • Send & receive buffers • Flow controlled: socket door application writes data application reads data TCP send buffer TCP receive buffer socket door – Sender will not overwhelm receiver segment 46 TCP Segment Structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid Source port # Dest port # Sequence number Counting by bytes of data (not segments!) Acknowledgement number PSH: push data now (Generally not used) RST, SYN, FIN: Connection establishment (Setup, teardown cmds) Internet Checksum (As in UDP) Head Not UA P R S F Len Used Checksum Rcvr window size Ptr urgent data Options (variable length) # Bytes receiver willing to accept Application data (variable length) 47 TCP Sequence #s and ACKs Seq. #’s: – Byte stream “number” of first byte in segment’s data ACKs: – Seq. # of next byte expected from other side – Cumulative ACK Q: How does receiver handle out-of-order segments? – A: TCP spec doesn’t say, up to implementer Host A Host B User types ‘C’ Host ACKs receipt of ‘C’, echoes back ‘C’ Host ACKs receipt of echoed ‘C’ time Simple Telnet Scenario 48 TCP: Reliable Data Transfer Event: Data received from application above Create, send segment Wait Wait For for Event event Simplified sender, assuming • One way data transfer • No flow, congestion control Event: timer timeout for segment with seq # y Retransmit segment Event: ACK received, with ACK # y ACK processing 49 TCP: Reliable Data Transfer 50 TCP ACK Generation [RFC 1122, RFC 2581] Event TCP Receiver Action In-order segment arrival, no gaps, everything else already ACKed Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK In-order segment arrival, no gaps, one delayed ACK pending Immediately send single cumulative ACK Out-of-order segment arrival higher-than-expect seq. # gap detected Send duplicate ACK, indicating seq. # of next expected byte Arrival of segment that partially or completely fills gap Immediate ACK if segment starts at lower end of gap 51 TCP: Retransmission Scenarios X loss time Host A Host B Lost ACK scenario Host B Seq=100 timeout Seq=92 timeout timeout Host A time Premature timeout, cumulative ACKs 52 TCP Flow Control Flow control Sender won’t overrun receiver’s buffers by transmitting too much, too fast RcvBuffer = Size or TCP receive buffer RcvWindow = Amount of spare room in buffer Receiver: Explicitly informs sender of (dynamically changing) amount of free buffer space – RcvWindow field in TCP segment Sender: Keeps the amount of transmitted, unACKed data less than most recently received RcvWindow Receiver buffering 53 TCP Round Trip Time and Timeout Q: How to set TCP timeout value? • Longer than RTT – Note: RTT will vary • Too short: premature timeout – Unnecessary retransmissions • Too long: slow reaction to segment loss Q: How to estimate RTT? • SampleRTT: measured time from segment transmission until ACK receipt – Ignore retransmissions, cumulatively ACKed segments • SampleRTT will vary, want estimated RTT “smoother” – Use several recent measurements, not just current SampleRTT 54 TCP Round Trip Time and Timeout Exponential weighted moving average Influence of given sample decreases exponentially fast Typical value of x: 0.1 Setting the timeout • EstimatedRTT plus “safety margin” • Large variation in EstimatedRTT needs larger safety margin 55 TCP Connection Management (1) Recall: TCP sender, receiver Three way handshake: establish “connection” before exchanging data segments • Initialize TCP variables: – Sequence #s – Buffers, flow control info (e.g. RcvWindow) • Client: Connection initiator Step 1: Client end system sends TCP SYN control segment to server – Specifies initial seq # (client_isn) Step 2: Server end system receives SYN, replies with SYNACK control segment – ACKs received SYN Socket clientSocket = – Allocates buffers new Socket("hostname","port#"); – Specifies server → receiver initial seq. # (server_isn) • Server: Contacted by client Step 3: Client allocates buffers and Socket connectionSocket = variables upon receiving SYNACK welcomeSocket.accept(); – SYN=0 – Seq = client_isn + 1 – Ack = server_isn +1 56 TCP Connection Management (2) Closing a connection: Client closes socket: clientSocket.close(); Client Server close Step 1: Client end system sends close Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN. Timed wait TCP FIN control segment to server Closed 57 TCP Connection Management (3) Step 3: Client receives FIN, replies with ACK. Client Server closing – Enters “timed wait” - will respond with ACK to received FINs closing Step 4: Server receives ACK. Note: With small modification, can handle simultaneous FINs. Timed wait Connection closed. Closed Closed 58 TCP Connection Management (4) TCP server lifecycle TCP client lifecycle 59 Principles of Congestion Control Congestion: • Informally: “too many sources sending too much data too fast for network to handle” • Different from flow control! • Manifestations: – Lost packets (buffer overflow at routers) – Long delays (queueing in router buffers) • Top-10 problem! 60 Causes/Costs of Congestion: Scenario 1 • Two senders, two receivers • One router, infinite buffers • No retransmission • Large delays when congested • Maximum achievable throughput 61 Causes/Costs of Congestion: Scenario 2 • One router, finite buffers • Sender retransmission of lost packet 62 Approaches Towards Congestion Control Two broad approaches towards congestion control: End-end congestion control: Network-assisted congestion control: • No explicit feedback from network • Congestion inferred from endsystem observed loss, delay • Approach taken by TCP • Routers provide feedback to end systems – Single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) – Explicit rate sender should send at 63 TCP Congestion Control (1) • End-end control (no network assistance) • Transmission rate limited by congestion window size, cwnd, over segments: cwnd w segments, each with MSS bytes sent in one RTT: 64 TCP Congestion Control (2) • “Probing” for usable bandwidth: – Ideally: transmit as fast as possible (cwnd as large as possible) without loss – Increase cwnd until loss (congestion) – Loss: decrease cwnd, then begin probing (increasing) again • Two “phases” – Slow start – Congestion avoidance • Important variables: – cwnd – ssthresh: defines threshold between two slow start phase, congestion control phase 65 TCP Slow Start Host B RTT Host A • Exponential increase (per RTT) in window size (not so slow!) • Loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) time 66 TCP Congestion Avoidance * TCP Reno skips slow start (fast recovery) after three duplicate ACKs 67 TCP Congestion Control FSM New ACK! New ACK! new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 . duplicate ACK dupACKcount++ Λ cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s), as allowed transmit new segment(s), as allowed Slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 cwnd > ssthresh Congestion avoidance Λ timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 duplicate ACK dupACKcount++ retransmit missing segment New ACK! timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 New ACK retransmit missing segment cwnd = ssthresh dupACKcount = 0 ssthresh= cwnd/2 cwnd = ssthresh + 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment dupACKcount == 3 transmit new segment(s), as allowed Fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed AIMD TCP congestion avoidance: • AIMD: Additive increase, multiplicative decrease – Increase window by 1 per RTT – Decrease window by factor of 2 on loss event TCP Fairness Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity TCP connection 1 TCP connection 2 Bottleneck router capacity C 69 Why is TCP Fair? Two competing sessions: • Additive increase gives slope of 1, as throughout increases • Multiplicative decrease decreases throughput proportionally R Equal bandwidth share Loss: decrease window by factor of 2 Congestion avoidance: additive increase Loss: decrease window by factor of 2 Congestion avoidance: additive increase Connection 1 throughput R 70 Summary • Principles behind transport layer services: – – – – Multiplexing/demultiplexing Reliable data transfer Flow control Congestion control Next: • Leaving the network “edge” (application transport layer) • Entering the network “core” • Instantiation and implementation in the Internet – UDP – TCP 71