Download TamVu_TCP_lec_DR13 - Winlab

Document related concepts

Airborne Networking wikipedia , lookup

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Multiprotocol Label Switching wikipedia , lookup

Net bias wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Serial digital interface wikipedia , lookup

I²C wikipedia , lookup

Wake-on-LAN wikipedia , lookup

RapidIO wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

Real-Time Messaging Protocol wikipedia , lookup

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Transport Layer
ECE544: Communication Networks-II, Spring 2013
Tam Vu
WINLAB, Dept. of Computer Science
Rutgers University
Includes teaching materials from, L. Peterson, Sumathi Gopal and Sumit Rangwala, D. Raychaudhuri, Mike Freedman
IP Protocol Stack: Key Abstractions
Application
Transport
Network
Link
Applications
Reliable streams
Messages
Best-effort global packet delivery
Best-effort local packet delivery
 Problem:
Network Layer (IP) provides only best-effort communication services
2
Applications requirements
vs. IP layer limitations
 Guarantee message delivery
 Network may drop messages.
 Deliver messages in the same order they are sent
 Messages may be reordered in networks and incurs a long delay.
 Delivers at most one copy of each message
 Messages may duplicate in networks.
 Support arbitrarily large message
 Network may limit message size.
 Support synchronization between sender and receiver
 Allows the receiver to apply flow control to the sender
 Support multiple application processes on each host
 Network only support communication between hosts
 Many more
IP Protocol Stack: Key Abstractions
Application
Transport
Network
Link
Applications
Reliable streams
Messages
Best-effort global packet delivery
Best-effort local packet delivery
 Transport layer:
 Provide applications with good abstractions
 Without support or feedback from the network
 Is the lowest layer in the network stack that is an end-to-end protocol
4
Transport Protocols
 Logical communication between processes
 Sender divides a message into segments
 Receiver reassembles segments into message
 Transport services
 (De)multiplexing packets
 Detecting corrupted data
 Optionally: reliable delivery, flow control, …
5
Two Basic Transport Features
 Demultiplexing: port numbers
Server host 128.2.194.242
Client host
Client
Service request for
128.2.194.242:80
(i.e., the Web server)
Web server
(port 80)
OS
Echo server
(port 7)
 Error detection: checksums
IP
payload
detect corruption
6
Most Popular Transport Protocols
 User Datagram Protocol (UDP)
 Support multiple applications processes on each host
 Option to check messages for correctness with CRC check
 Transmission Control Protocol (TCP)
 Ensures reliable delivery of packets between source and
destination processes
 Ensures in-order delivery of packets to destination process
 Other services
 Real Time Protocol (RTP)
 Serves real-time multimedia applications
 Moves decision making to the applications
 Runs over UDP
User Datagram Protocol (UDP)
 Service: Support for multiple processes on each host to communicate
 Issue: IP only provides communication between hosts (IP addresses)
 Solution
 Add port number and associate a process with a port number
 4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort, DestIPAddr ]
 Lightweight communication between processes
 Send and receive messages
0
 Avoid overhead of ordered, reliable delivery
 No connection setup delay, in-kernel connection state
 Used by popular apps
 Query/response for DNS
 Real-time data in VoIP
16
SrcPort
DesPort
Length
Checksum
Payload
31
User Datagram Protocol (UDP):
Error Detection
 Service: Ensure message correctness
 Issue: Packet corruption in transit
 Solution
 Use Checksum.
 Includes UDP header, payload, pseudo header
 Pseudo header
 Protocol number, source IP address, destination IP address, and UDP
length
0
16
SrcPort
DesPort
Length
Checksum
Payload
31
Transmitting a stream of bytes ?
 Stream-of-bytes service
 Sends and receives a stream of
bytes
 Reliable, in-order delivery
 Corruption: checksums
 Detect loss/reordering:
sequence numbers
 Reliable delivery:
acknowledgments and
retransmissions
11
 Connection oriented
 Explicit set-up and tear-
down of TCP connection
 Flow control
 Prevent overflow of the
receiver’s buffer space
 Congestion control
 Adapt to network
congestion for the greater
good
Transmission Control Protocol (TCP)
 First proposed by Vinton Cerf and Robert Kahn, 1974
 TCP/IP enabled computers of all sizes, from different vendors,
different OSs, to communicate with each other.
 Used by 80% of all traffic on the Internet
 Reliable, in-order delivery, connection-oriented, bye-stream
service
Starting and Ending a Connection:
TCP Handshakes
Establishing a TCP Connection
A
B
Each host tells its
Initial Sequence
Number (ISN) to the
other host.
 Three-way handshake to establish connection
 Host A sends a SYN (open) to the host B
 Host B returns a SYN acknowledgment (SYN ACK)
 Host A sends an ACK to acknowledge the SYN ACK
14
TCP Header
Source port
Flags: SYN
FIN
RST
PSH
URG
ACK
Sequence number
Acknowledgment
HdrLe 0 Flags Advertised window
n
Checksum
Urgent pointer
Options (variable)
Data
15
Destination port
Step 1: A’s Initial SYN Packet
A’s port
Flags: SYN
FIN
RST
PSH
URG
ACK
B’s port
A’s Initial Sequence Number
Acknowledgment
20
0
Flags Advertised window
Checksum
Urgent pointer
Options (variable)
A tells B it wants to open a connection…
16
Step 2: B’s SYN-ACK Packet
B’s port
Flags: SYN
FIN
RST
PSH
URG
ACK
A’s port
B’s Initial Sequence Number
A’s ISN plus 1
20
0
Flags Advertised window
Checksum
Urgent pointer
Options (variable)
B tells A it accepts, and is ready to hear the next byte…
… upon receiving this packet, A can start sending data
17
Step 3: A’s ACK of the SYN-ACK
A’s port
Flags: SYN
FIN
RST
PSH
URG
ACK
B’s port
Sequence number
B’s ISN plus 1
20
0
Flags Advertised window
Checksum
Urgent pointer
Options (variable)
A tells B it is okay to start sending
… upon receiving this packet, B can start sending data
18
SYN Loss and Web Downloads
 Upon sending SYN, sender sets a timer
 If SYN lost, timer expires before SYN-ACK received
 Sender retransmits SYN
 How should the TCP sender set the timer?
 No idea how far away the receiver is
 Some TCPs use default of 3 or 6 seconds
 Implications for web download
 User gets impatient and hits reload
 … Users aborts connection, initiates new socket
 Essentially, forces a fast send of a new SYN!
19
Tearing Down the Connection
B
A
time
 Closing (each end of) the connection
 Finish (FIN) to close and receive remaining bytes
 And other host sends a FIN ACK to acknowledge
 Reset (RST) to close and not receive remaining bytes
20
Sending/Receiving the FIN Packet
 Sending a FIN: close()
 Process is done sending data
via socket
 Process invokes “close()”
 Once TCP has sent all the
 Receiving a FIN: EOF
 Process is reading data
from socket
 Eventually, read call
returns an EOF
outstanding bytes…
 … then TCP sends a FIN
21
Data transmission
TCP: Byte-stream
 Service: Byte-stream
 Application reads or writes a stream of bytes to the transport
 Issue: IP is packet-oriented
 Solution: TCP maintains a local buffer
 Chop the stream into packets and transmit (sender)
 Coalesce data from packets to form a stream (receiver)
TCP “Stream of Bytes” Service
Host A
Host B
24
…Emulated Using TCP “Segments”
Host A
Segment sent when:
1. Segment full (Max Segment Size),
2. Not full, but times out, or
3. “Pushed” by application
TCP Data
Host B
TCP Data
25
TCP Segment
 IP packet
IP Data
TCP Data (segment)
TCP Hdr
IP Hdr
 No bigger than Maximum Transmission Unit (MTU)
 E.g., up to 1500 bytes on an Ethernet link
 TCP packet
 IP packet with a TCP header and data inside
 TCP header is typically 20 bytes long
 TCP segment
 No more than Maximum Segment Size (MSS) bytes
 E.g., up to 1460 consecutive bytes from the stream
26
Sequence Number
Host A
ISN (initial sequence number)
Sequence
number =
1st byte
Host B
TCP Data
TCP Data
27
Reliable Delivery on a Lossy
Channel With Bit Errors
Challenges of Reliable Data Transfer
 Over a perfectly reliable channel: Done
 Over a channel with bit errors
 Receiver detects errors and requests retransmission
 Over a lossy channel with bit errors
 Some data missing, others corrupted
 Receiver cannot easily detect loss
 Over a channel that may reorder packets
 Receiver cannot easily distinguish loss vs. out-of-order
30
An Analogy
 Alice and Bob are talking
 What if Alice couldn’t understand Bob?
 Bob asks Alice to repeat what she said
 What if Bob hasn’t heard Alice for a while?
 Is Alice just being quiet? Has she lost reception?
 How long should Bob just keep on talking?
 Maybe Alice should periodically say “uh huh”
 … or Bob should ask “Can you hear me now?”
31
Take-Aways from the Example
 Acknowledgments from receiver
 Positive: “okay” or “uh huh” or “ACK”
 Negative: “please repeat that” or “NACK”
 Retransmission by the sender
 After not receiving an “ACK”
 After receiving a “NACK”
 Timeout by the sender (“stop and wait”)
 Don’t wait forever without some acknowledgment
32
TCP Support for Reliable Delivery

Detect bit errors: checksum



Detect missing data: sequence number



Used to detect corrupted data at the receiver
…leading the receiver to drop the packet
Used to detect a gap in the stream of bytes
... and for putting the data back in order
Recover from lost data: retransmission


Sender retransmits lost or corrupted data
Two main ways to detect lost packets
33
TCP Acknowledgments
Host A
ISN (initial sequence number)
Sequence
number =
1st byte
Host B
TCP Data
ACK sequence
number = next
expected byte
TCP Data
34
Automatic Repeat reQuest (ARQ)
 Simplest ARQ protocol
 Stop and wait
 Send a packet, stop and wait
until ACK arrives
35
Sender
Timeou
t
 ACK and timeouts
 Receiver sends ACK when
it receives packet
 Sender waits for ACK
and times out
Time
Receiver
Quick TCP Math
•
Initial Seq No = 501. Sender sends 4500 bytes successfully
acknowledged. Next sequence number to send is:
(A) 4501
(B) 5000 (C) 5001 (D) 5002
• Next 1000 byte TCP segment received.
Receiver acknowledges with ACK number:
(A) 5001 (B) 6000
36
(C) 6001
Flow Control:
TCP Sliding Window
Sliding Window: Motivation
 Stop-and-wait is inefficient
 Only one TCP segment is “in flight” at a time
 Consider: 1.5 Mbps link with 50 ms round-trip-time (RTT)
 Assume segment size of 1 KB (8 Kbits)
 8 Kbits/segment at 50 msec/segment  160 Kbps
 That’s 11% of the capacity of 1.5 Mbps link
39
Sliding Window
 Allow a larger amount of data “in flight”
 Allow sender to get ahead of the receiver
 … though not too far ahead
Sending process
TCP
Last byte written
Last byte ACKed
Last byte sent
Receiving process
TCP
Last byte read
Next byte expected
Last byte received
40
Receiver Buffering
 Receive window size
 Amount that can be sent without acknowledgment
 Receiver must be able to store this amount of data
 Receiver tells the sender the window
 Tells the sender the amount of free space left
Window Size
Data ACK’dOutstanding Data OK Data not OK
Un-ack’d data to send to send yet
41
TCP: Flow Control
 Flow Control
 “Prevent sender from overrunning the capacity (buffer) of the receiver”
 Solution: Use adaptive receiver window size
 Goal is to keep (C) – (A) < MaxRcvBuffer
 Every packet carries ACK and AdvertisedWindow
Receiving Appl
Sending Appl
TCP
LastByteAcked (J)
(I) LastByteWritten
(K) LastByteSent
LastByteSent (K) – LastByteAcked (J) <= AdvertisedWindow
EffWin = AdvertisedWin (LastByteSent-LastByteAcked)
LastByteWritten – LastByteAcked <= MaxSendBuffer
LastByteRead
(A)
(B)
NextByteExpected
TCP
(C)
LastByteRcvd
AdvertisedWindow = MaxRcvBuffer((NextByteExp-1)-LastByteRead)
Optimizing Retransmissions
43
44
Timeou
t
Timeou
t
Timeou
t
Packet lost
Timeou
t
Timeou
t
Timeou
t
Reasons for Retransmission
ACK lost
DUPLICATE
PACKET
Early timeout
DUPLICATE
PACKETS
How Long Should Sender Wait?
 Sender sets a timeout to wait for an ACK
 Too short: wasted retransmissions
 Too long: excessive delays when packet lost
 TCP sets timeout as a function of the RTT
 Expect ACK to arrive after an “round-trip time”
 … plus a fudge factor to account for queuing
 But, how does the sender know the RTT?
 Running average of delay to receive an ACK
45
TCP Timeout
 Issue: RTT in a wide area network varies substantially
 Solution: Adaptive Timeout
 Original Algorithm:
 EstimatedRTT = a x EstimatedRTT + (1-a) x SampleRTT
0 a 1
 Timeout = β x EstimatedRTT (β = 2)
 Problem
 Does not distinguish whether the ACK is for original transmission or
retransmission
 Constant β is not good.
 Assumes constant variance
TCP Timeout
 Karn/Partridge Algorithm
 Whenever TCP retransmits a segment, it stops taking samples of the RTT
 Only measure SampleRTT for segments that have been sent only once
 Each time TCP retransmits, set the next timeout to be twice the last timeout
 Relieves congestion
 Jacobson/Karels Algorithm: Adaptive variance (uses mean
variance)
Difference = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + (d x Difference) → (same as in original)
Deviation = Deviation + d(|Difference|- Deviation)
Timeout = m x EstimatedRTT + f x Deviation
(default: set m = 1 and f= 4 )
0  d 1
TCP Deadlock
 TCP Deadlock
 receiver advertises a window size of 0, the sender stops sending
data
 the window size update from the receiver is lost
 To solve it:
 the sender starts the persist timer when AdvertisedWindow = 0
 When the persist timer expires, the sender sends a small packet
Triggering Transmission
 When to transmit a segment:
 small segments subject to large overhead
 Reach max segment size (MSS): the size of the largest
segment TCP can send without causing the local IP to
fragment
 MSS = local MTU – IP & TCP header
 The sending process explicitly ask the TCP to transmit,
“push”
Congestion
Source
1
Even with flow control packets might
not reach the destination
Dest
1
Source
2
Source
3
Dest
2
 When the network cannot support the sender’s rate
 Queues at the network elements overflow
Congestion Control vs. Flow Control
 Congestion Control
 Mechanism to prevent sender from overrunning the capacity of
the network
 When network is the bottleneck
 Flow Control
 Mechanism to prevent sender from overrunning the capacity of
the receiver
 When receiver is the bottleneck
Congestion Control: Design Approach
 Maintain another window at the sender called CongestionWindow
(cwnd)
 CongestionWindow is the max number of packets allowed in the network
 Number of unACKed packets at the sender.
 Key: How to calculate congestion window (cwnd)
 Various approaches possible
 TCP estimates it based on observed packet losses
 Assumes packet loss as indication of congestion
 Since we don’t know whether the network or the receiver is the
bottleneck
 MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
 EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked)
Congestion Avoidance: (AIMD)
 If no congestion in the network (increase conservatively)
 Increase the congestion window additively every RTT
Every RTT
w=w+1
w = cwnd in segments
Every ACK reception
w = w + 1/w
w = cwnd in segments
Every ACK reception
cwnd = cwnd +
MSS*(MSS/cwnd)
cwnd in bytes
 If congestion in the network (decrease aggressively)
 Decrease the congestion window multiplicatively, immediately
cwnd = cwnd/2
cwnd in bytes
 How is congestion detected?
 Estimated (more later)
CongestionWindow Size
Congestion Avoidance: (AIMD)
Startup time
Time
 TCP’s saw tooth pattern
 Issues with additive increase
 takes too long to ramp up a connection from the beginning
 The entire advertised window may be reopened when a lost
packet retransmitted and a single cumulative ACK is received by
the sender
TCP “Slow Start”: To start quickly!
 Maintain another variable slow start
threshold (ssthresh)
 Last known stable rate
 If (cwnd > ssthresh)
 State = congestion avoidance
 Else
 State = slow start
 In Slow start
 Increase the congestion window exponentially
every RTT
Every ACK reception
w=w+1
w = cwnd in segments
Every ACK reception
cwnd = cwnd +
MSS
cwnd in bytes
 Key: How is ssthresh calculated?
TCP: Congestion Detection and
Retransmit
 Loss of packet indicates congestion
 Timer Timeouts (No ACK)
 Set according to Jacobson/Karels algorithm
 On timer timeout
 ssthresh = max(2*MSS, effwin/2); cwnd = MSS
 Notice this will cause TCP to go into slow start
 Issue: takes a long time to detect a packet loss
 Affects throughput
 Any other quicker way of detecting a packet loss?
Fast Retransmit
 Observation: A series of duplicate
ACKs might mean a packet loss
 Solution
 Every time receiver receives a
packet (out-of-order), sends a
duplicate ACK
 Sender retransmit the missing
packet after it receives some
number of duplicate ACKs (e.g. 3
duplicate ACKs)
PKT 1
PKT 2
PKT 3
PKT 4
PKT 5
PKT 6
ACK 1
ACK 2
ACK 2
ACK 2
ACK 2
 Fast Retransmit does not replace
timeouts
 Issue: Reduces latency (early
retransmit) but still incurs loss in
throughput (slow start after packet
loss )
PKT 3
Retran
ACK 6
Fast Recovery
 Transmit a packet for every
ACK received till the
retransmitted packet is ACK’d
 ssthresh= (2*MSS, cwdn/2);
cwnd = sshthred + 3
 On every ACK will the ACK of
retransmitted packet
 cwnd = cwnd + 1
 On reception of ACK of
retransmitted packet
 Start congestion avoidance instead
of slow start
 cwnd = ssthresh
Homework
 5.13 (3rd ed and 4th ed)
 5.16
 5.28
 5.34
 5.39
Due 4/5