Download Lecture04

Document related concepts

Network tap wikipedia , lookup

Computer network wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Net bias wikipedia , lookup

Wake-on-LAN wikipedia , lookup

RapidIO wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Lecture 04: Transport Layer
 Transport layer protocols in the Internet:
 UDP: connectionless transport
 TCP: connection-oriented transport
 TCP congestion control
4-1
Provides end-to-end connectivity, but
not necessarily good performance
name
link
session
path
address
Internet transport-layer protocols
 reliable, in-order
delivery (TCP)



congestion control
flow control
connection setup
 unreliable, unordered
delivery: UDP

no-frills extension of
“best-effort” IP
 services not available:
 delay guarantees
 bandwidth guarantees
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physicalnetwork
network
data link
physical
data link
physical
network
data link
physical
application
transport
network
data link
physical
Two Basic Transport Features
 Demultiplexing: port numbers
Server host 128.2.194.242
Client host
Service request for
128.2.194.242:80
(i.e., the Web server)
Web server
(port 80)
OS
Client
Echo server
(port 7)
 Error detection: checksums
IP
payload
detect corruption
User Datagram Protocol (UDP)
 Datagram messaging service
 Demultiplexing:
port numbers
 Detecting corruption: checksum
 Lightweight communication between processes
 Send and receive messages
 Avoid overhead of ordered, reliable delivery
SRC port
DST port
checksum
length
DATA
Advantages of UDP
 Fine-grain control

UDP sends as soon as the application writes
 No connection set-up delay
 UDP sends without establishing a connection
 No connection state

No buffers, parameters, sequence #s, etc.
 Small header overhead
 UDP header is only eight-bytes long
Popular Applications That Use
UDP
 Multimedia streaming


Retransmitting packets is not always worthwhile
E.g., phone calls, video conferencing, gaming, IPTV
 Simple query-response protocols


Overhead of connection establishment is overkill
E.g., Domain Name System (DNS), DHCP, etc.
“Address for www.cnn.com?”
“12.3.4.15”
Transmission Control Protocol (TCP)
 Stream-of-bytes
service

Sends and receives a
stream of bytes
 Reliable, in-order
delivery



Corruption: checksums
Detect loss/reordering:
sequence numbers
Reliable delivery:
acknowledgments and
retransmissions
 Connection oriented
 Explicit set-up and
tear-down of TCP
connection
 Flow control
 Prevent overflow of
the receiver’s buffer
space
 Congestion control
 Adapt to network
congestion for the
greater good
Breaking a Stream of Bytes
into TCP Segments
TCP “Stream of Bytes” Service
Host A
Host B
…Emulated Using TCP
“Segments”
Host A
Segment sent when:
TCP Data
Host B
1.
2.
3.
TCP Data
Segment full (Max Segment Size),
Not full, but times out, or
“Pushed” by application.
TCP Segment
 IP packet
IP Data
TCP Data (segment)
TCP Hdr
IP Hdr
No bigger than Maximum Transmission Unit (MTU)
 E.g., up to 1500 bytes on an Ethernet link

 TCP packet
 IP packet with a TCP header and data inside
 TCP header is typically 20 bytes long
 TCP segment
No more than Maximum Segment Size (MSS) bytes
 E.g., up to 1460 consecutive bytes from the stream

Sequence Number
Host A
ISN (initial sequence number)
Sequence
number = 1st
byte
Host B
TCP Data
TCP Data
Reliable Delivery on a Lossy
Channel With Bit Errors
Challenges of Reliable Data Transfer
 Over a perfectly reliable channel

Easy: sender sends, and receiver receives
 Over a channel with bit errors
 Receiver detects errors and requests
retransmission
 Over a lossy channel with bit errors
 Some data are missing, and others corrupted
 Receiver cannot always detect loss
 Over a channel that may reorder packets
 Receiver cannot distinguish loss from out-oforder
An Analogy
 Alice and Bob are talking
What if Bob couldn’t understand Alice?
 Bob asks Alice to repeat what she said

 What if Bob hasn’t heard Alice for a while?
 Is Alice just being quiet? Has she lost
reception?
 How long should Bob just keep on talking?
 Maybe Alice should periodically say “uh huh”
 … or Bob should ask “Can you hear me now?” 
Take-Aways from the Example
 Acknowledgments from receiver
Positive: “okay” or “uh huh” or “ACK”
 Negative: “please repeat that” or “NACK”

 Retransmission by the sender
 After not receiving an “ACK”
 After receiving a “NACK”
 Timeout by the sender (“stop and wait”)
 Don’t
wait forever without some
acknowledgment
TCP Support for Reliable
Delivery

Detect bit errors: checksum
Used to detect corrupted data at the receiver
 …leading the receiver to drop the packet


Detect missing data: sequence number
 Used
to detect a gap in the stream of bytes
 ... and for putting the data back in order

Recover from lost data: retransmission
 Sender
retransmits lost or corrupted data
 Two main ways to detect lost packets
TCP Acknowledgments
Host A
ISN (initial sequence number)
Sequence number
= 1st byte
Host B
TCP Data
ACK sequence
number = next
expected byte
TCP Data
Automatic Repeat reQuest
(ARQ)
 ACK and timeouts
Receiver sends ACK when
it receives packet
 Sender waits for ACK
and times out
 Simplest ARQ protocol
 Stop and wait
 Send a packet, stop and
wait until ACK arrives

Timeout
Sender
Time
Receiver
Flow Control:
TCP Sliding Window
Motivation for Sliding Window
 Stop-and-wait is inefficient
Only one TCP segment is “in flight” at a time
 Especially bad for high “delay-bandwidth
product”

bandwidth
22
delay
Numerical Example
 1.5 Mbps link with 45 msec round-trip time
(RTT)

Delay-bandwidth product is 67.5 Kbits (or 8 KBytes)
 Sender can send at most one packet per RTT
Assuming a segment size of 1 KB (8 Kbits)
 8 Kbits/segment at 45 msec/segment  182 Kbps
 That’s just one-eighth of the 1.5 Mbps link capacity

Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged packets


range of sequence numbers must be increased
buffering at sender and/or receiver
 Pipelined protocols: concurrent logical channels, sliding
window protocol
3-24
Sliding Window Protocol
 Consider an infinite array, Source, at the
sender, and an infinite array, Sink, at the
receiver.
send window
Source:
P1
Sender
0
1
2
a–1 a
acknowledged
unacknowledged
next expected
received
Sink:
P2
Receiver
0
1
s–1 s
2
r + RW – 1
r
delivered
receive window
RW receive window size
SW send window size (s - a  SW)
3-25
Sliding Windows in Action
 Data unit r has just been received by P2

Receive window slides forward
 P2 sends cumulative ack with sequence
number it expects to receive next (r+3)
send window
Source:
P1
Sender
0
1
2
a–1 a
acknowledged
s–1 s
unacknowledged
r+3
next expected
Sink:
P2
Receiver
0
1
2
r + RW – 1
r
delivered
receive window
3-26
Sliding Windows in Action
 P1 has just received cumulative ack with
r+3 as next expected sequence number

Send window slides forward
send window
Source:
P1
Sender
0
1
2
a–1 a
s–1 s
acknowledged
next expected
Sink:
P2
Receiver
0
1
2
r + RW – 1
r
delivered
receive window
3-27
Sliding Window protocol
 Functions provided
 error
control (reliable delivery)
 in-order delivery
 flow and congestion control (by varying send
window size)
 TCP uses only cumulative acks
 Other kinds of acks
 selective nack
 selective ack (TCP SACK)
 bit-vector representing entire state of receive
window (in addition to first sequence number of
window)
3-28
Sliding Window Protocol
At the sender, a will be pointed to by
SendBase, and s by NextSeqNum
send window
Source:
P1
Sender
0
1
2
a–1 a
acknowledged
unacknowledged
next expected
received
Sink:
P2
Receiver
0
1
s–1 s
2
r + RW – 1
r
delivered
receive window
RW receive window size
SW send window size (s - a  SW)
3-29
TCP Flow Control
flow control
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
buffer at receive side of a TCP connection
receiver: explicitly
informs sender of
(dynamically changing)
amount of free buffer
space
 RcvWindow field in
TCP segment
sender: keeps amount of
transmitted, unACKed
data less than most
recently received
RcvWindow value
3-30
Optimizing Retransmissions
Packet lost
Timeout
Timeout
Timeout
Timeout
Timeout
Timeout
Reasons for Retransmission
ACK lost
DUPLICATE
PACKET
Early timeout
DUPLICATE
PACKETS
How Long Should Sender Wait?
 Sender sets a timeout to wait for an ACK
Too short: wasted retransmissions
 Too long: excessive delays when packet lost

 TCP sets timeout as a function of the RTT
 Expect ACK to arrive after an “round-trip time”
 … plus a fudge factor to account for queuing
 But, how does the sender know the RTT?
 Running
average of delay to receive an ACK
TCP Round Trip Time and Timeout
Q: how to estimate RTT?
 SampleRTT: measured time from segment
transmission until ACK receipt
 ignore retransmissions
 SampleRTT will vary, want estimated RTT
“smoother”
 average several recent measurements, not just
current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
 Exponential weighted moving average
 influence of past sample decreases
exponentially fast
 typical value:  = 0.125
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
time (seconnds)
SampleRTT
Estimated RTT
78
85
92
99
106
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout scenario
3-37
TCP retransmission scenarios (more)
timeout
Host A
Host B
X
loss
SendBase
= 120
time
Cumulative ACK scenario
3-38
Fast Retransmit
 Time-out period often
relatively long:

long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.


Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend
segment before timer
expires
Host A
Host B
timeout
X
time
Figure 3.37 Resending a segment after triple duplicate ACK
Fast retransmit algorithm:
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there remains a not-yet-acknowledged segment)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
reset timer for y
}
a duplicate ACK for
already ACKed segment
fast retransmit
3-41
Effectiveness of Fast
Retransmit
 When does Fast Retransmit work best?
High likelihood of many packets in flight
 Long data transfers, large window size, …
 Implications for Web traffic
 Most Web transfers are short (e.g., 10 packets)
• So, often there aren’t many packets in flight
 Making fast retransmit is less likely to “kick in”
• Forcing users to click “reload” more often… 

Starting and Ending a
Connection:
TCP Handshakes
Establishing a TCP Connection
A
B
Each host tells
its ISN to the
other host.
 Three-way handshake to establish connection
Host A sends a SYN (open) to the host B
 Host B returns a SYN acknowledgment (SYN ACK)
 Host A sends an ACK to acknowledge the SYN ACK

What if the SYN Packet Gets
Lost?
 Suppose the SYN packet gets lost
Packet is lost inside the network, or
 Server rejects the packet (e.g., listen queue is
full)

 Eventually, no SYN-ACK arrives
 Sender sets a timer and wait for the SYN-ACK
 … and retransmits the SYN if needed
 How should the TCP sender set the timer?
 Sender has no idea how far away the receiver is
 Some TCPs use a default of 3 or 6 seconds
SYN Loss and Web Downloads
 User clicks on a hypertext link
Browser creates a socket and does a “connect”
 The “connect” triggers the OS to transmit a
SYN

 If the SYN is lost…
 The 3-6 seconds of delay is very long
 The impatient user may click “reload”
 User triggers an “abort” of the “connect”
 Browser “connects” on a new socket
 Essentially, forces a fast send of a new SYN!
Lecture 04: Transport Layer
 Transport layer protocols in the Internet:
 UDP: connectionless transport
 TCP: connection-oriented transport
 TCP congestion control
Principles of Congestion Control
Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!
Receiver Window vs. Congestion
Window
 Flow control
Keep a fast sender from overwhelming a slow
receiver
 Congestion control
 Keep a set of senders from overloading the
network

 Different concepts, but similar mechanisms
 TCP
flow control: receiver window
 TCP congestion control: congestion window
 Sender TCP window =
min { congestion window, receiver window }
How it Looks to the End Host
Packet experiences high delay
 Loss: Packet gets dropped along path
 Delay:
 How does TCP sender learn this?
 Delay: Round-trip time estimate
 Loss:
Timeout and/or duplicate
acknowledgments
✗
Congestion Collapse
 Easily leads to congestion collapse
Senders retransmit the lost packets
 Leading to even greater load
 … and even more packet loss

“congestion
collapse”
Goodput
Load
Increase in load that
results in a decrease in
useful work done.
Approaches towards congestion control
End-to-end congestion
control:
 no explicit feedback from
network
 congestion inferred from
end-system’s observed loss
and/or delay
 approach taken by TCP
Network-assisted
congestion control:
 routers provide feedback
to end systems
 single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
 explicit sending rate
for sender
TCP Congestion control
 end-to-end control (no network
assistance)
 Tradeoff
Pro: avoids needing explicit network feedback
 Con: continually under- and over-shoots “right”
rate

TCP Congestion control
 Each TCP sender maintains a congestion
window
 Max
number of bytes to have in transit (not
yet ACK’d)
 Adapting the congestion window
Decrease upon losing a packet: backing off
 Increase upon success: optimistically exploring
 Always struggling to find right transfer rate

TCP Congestion Control
How does sender determine CongWin?
 loss event = timeout or 3 duplicate acks
 TCP sender reduces CongWin after loss event
three mechanisms:



slow start
AIMD
reduce to 1 segment after timeout event
TCP Slow Start
 Probing for usable bandwidth
 When connection begins, CongWin = 1 MSS
Example: MSS = 500 bytes & RTT = 200 msec
 initial rate = 20 kbps

 available bandwidth may be >> MSS/RTT
 desirable to quickly ramp up to a higher rate
TCP Slow Start (more)
 When connection


Host B
RTT
begins, increase rate
exponentially until
first loss event or
“threshold”
Host A
double CongWin every
RTT
done by incrementing
CongWin by 1 MSS for
every ACK received
 Summary: initial rate
is slow but ramps up
exponentially fast
time
Congestion avoidance state &
responses to loss events
Implementation:
 For initial slow start,
14
congestion window size
(segments)
Q: If no loss, when
should the exponential
increase switch to
linear?
A: When CongWin gets
to current value of
threshold
threshold is set to a very
large value (e.g., 65 Kbytes)
 At loss event, threshold is set
to 1/2 of CongWin just
before loss event
TCP
Reno
12
10
8
6
threshold
4
2
0
TCP
Tahoe
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
Series1
Series2
Rationale for Reno’s Fast Recovery
 After 3 dup ACKs:
 3 dup ACKs indicates
network capable of
delivering some segments
 timeout occurring
before 3 dup ACKs is
“more alarming”
CongWin is cut in half
 window then grows linearly
 But after timeout event:
 CongWin is set to 1 MSS
instead;
 window then grows
exponentially to a threshold,
then grows linearly

Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
AIMD in steady state
additive increase:
increase CongWin by
1 MSS every RTT in
the absence of any
loss event: probing
multiplicative decrease:
cut CongWin in half
after loss event (3 dup
acks)
congestion
window
24 Kbytes
16 Kbytes
8 Kbytes
Long-lived TCP connection
time
Why is TCP fair?
Two competing sessions:
R
equal window size
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 window size R
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K (AIMD only provides convergence
to same window size, not necessarily same throughput rate)
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Fairness (more)
Fairness and UDP
 Multimedia apps often do not use TCP
 do not want rate throttled by congestion
control
 Instead use UDP:
 pump audio/video at constant rate, tolerate
packet loss
 TCP-friendly congestion control for apps that
prefer UDP, e.g., Datagram Congestion Control
Protocol (DCCP)
End of Lecture04