Download Transport Layer and Resource Allocation

Document related concepts

Airborne Networking wikipedia , lookup

Low-voltage differential signaling wikipedia , lookup

RapidIO wikipedia , lookup

RS-232 wikipedia , lookup

CAN bus wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Internet protocol suite wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

TCP congestion control wikipedia , lookup

UniPro protocol stack wikipedia , lookup

IEEE 1355 wikipedia , lookup

Transcript
Chapter 3: Transport Layer
OBJECTIVES:
 understand principles
behind transport
layer services:




multiplexing/demultipl
exing
reliable data transfer
flow control
congestion control
 learn about transport
layer protocols in the
Internet:



UDP: connectionless
transport
TCP: connection-oriented
transport
TCP congestion control
1-1
Transport services and protocols
 provide logical communication
between app processes
running on different hosts
 transport protocols run in
end systems
 send side: breaks app
messages into segments,
passes to network layer
 rcv side: reassembles
segments into messages,
passes to app layer
 more than one transport
protocol available to apps
 Internet: TCP and UDP
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
1-2
Transport vs. Network Layer
 network layer: logical communication
between hosts
PDU: datagram
 packets may be lost, duplicated, reordered in
the Internet – “best effort” service

 transport layer: logical communication
between processes
relies on, enhances, network layer services
 PDU: segment
 extends “host-to-host” communication to
“process-to-process” communication

1-3
TCP/IP Transport Layer Protocols
 reliable, in-order delivery (TCP)
congestion control
 flow control
 connection setup

 unreliable, unordered delivery: UDP
 no-frills extension of “best-effort” IP
 What does UDP provide in addition to IP?
• process-to-process deliver
• error checking
1-4
Multiplexing/Demultiplexing
HTTP
Transport
Layer
Network
Layer
FTP
Telnet
Transport
Layer
Network
Layer
 Use same communication channel between
hosts for several logical communication
processes
 How does Mux/DeMux work?
Sockets: doors between process & host
 UDP socket: (dest. IP, dest. Port)
 TCP socket: (src. IP, src. port, dest. IP, dest. Port)

1-5
Connectionless demux
 UDP socket identified by two-tuple:
 (dest IP address, dest port number)
 When host receives UDP segment:
 checks destination port number in segment
 directs UDP segment to socket with that port number
 IP datagrams with different source IP addresses
and/or source port numbers are directed to the
same socket
1-6
Connection-oriented demux
 TCP socket identified
by 4-tuple:




source IP address
source port number
dest IP address
dest port number
 recv host uses all four
values to direct
segment to appropriate
socket
 Server host may support
many simultaneous TCP
sockets:

each socket identified by
its own 4-tuple
 Web servers have
different sockets for
each connecting client

non-persistent HTTP will
have different socket for
each request
1-7
UDP: User Datagram Protocol [RFC 768]
 “bare bones” Internet transport protocol
 “best effort” service, UDP segments may be:
lost
 delivered out of order to app

 Why use UDP?
 No connection establishment cost (critical for
some applications, e.g., DNS)
 No connection state
 Small segment headers (only 8 bytes)
 Finer application control over data transmission
1-8
UDP Segment Structure
 often used for streaming
multimedia apps
 loss tolerant
 rate sensitive
 Other appl.
using UDP
Length, in
bytes of UDP
segment,
including
Protocols
header
DNS
 SNMP
 reliable transfer over UDP:
add reliability at
application layer
 application-specific
error recovery!

32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
1-9
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender:
Receiver:
 treat segment contents
 compute checksum of
as sequence of 16-bit
integers
 checksum: addition (1’s
complement sum) of
segment contents
 sender puts checksum
value into UDP checksum
field
received segment
 check if computed checksum
equals checksum field value:
 NO - error detected
 YES - no error detected.
1-10
Internet Checksum Example
When adding numbers, a carryout
from the most significant bit needs to be
added to the result
 Note:
 Example: add two 16-bit integers
 Weak error protection? Why is it useful?
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
1-11
Checksum Calculation
At the sender
 Set the value of the checksum field to 0.
 Add all segments using one’s complement
arithmetic
 The final result is complemented to obtain
the checksum
At the receiver
 Add all segments, and complement’s the
results
 All zero’s => accept datagram, else reject
1-12
UDP Checksum
 Fill checksum with 0’s
 Divide into 16-bit words
(adding padding if
required)
 Add words using 1’s
complement arithmetic
 Complement the result and
put in checksum field
 Deliver UDP segment to IP
source port #
dest port #
length
checksum
data
(add padding to make
data a multiple of 16 bits)
32 bits
1-13
Principles of Reliable Data Transfer
 important in application, transport, and link layers
 Fundamentally important problem in networking
Sending
Process
Receiving
Process
Reliable Channel
Application Layer
Sending
Process
Transport Layer
RDT protocol
(sending side)
Network Layer
Receiving
Process
RDT protocol
(receiving side)
Unreliable Channel
 characteristics of unreliable channel will determine
complexity of reliable data transfer (rdt) protocol
1-14
Reliable Data Transfer: FSMs
We’ll:
 incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
 consider only unidirectional data transfer

but control info will flow on both directions!
 use finite state machines (FSM) to specify
sender, receiver
state: when in this
“state” next state
uniquely determined
by next event
state
1
event causing state transition
actions taken on state transition
event
actions
state
2
1-15
Rdt1.0: Data Transfer over a Perfect Channel
 underlying channel perfectly reliable
 no bit errors
 packets received in the order sent
 no loss of packets
 separate FSMs for sender, receiver:
 sender sends data into underlying channel
 receiver read data from underlying channel
Wait for
call from
above
rdt_send(data)
packet = make_pkt(data)
udt_send(packet)
sender
Wait for
call from
below
rdt_rcv(packet)
extract (packet,data)
deliver_data(data)
receiver
1-16
Rdt2.0: channel with bit errors
[stop & wait protocol]
 Assumptions
 All packets are received
 Packets may be corrupted (i.e., bits may be flipped)
 Checksum to detect bit errors
 How to recover from errors? Use Automatic
Repeat reQuest) ARQ mechanism



acknowledgements (ACKs): receiver explicitly tells sender
that the packet is received correctly
negative acknowledgements (NAKs): receiver explicitly
tells sender that the packet had errors
sender retransmits pkt on receipt of NAK
 What about error correcting codes?
 ARQ needs error detection capability
1-17
rdt2.0: FSM specification
rdt_send(data)
sndpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for
call from
ACK or
udt_send(sndpkt)
above
NAK
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
sender
receiver
rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
udt_send(NAK)
Wait for
call from
below
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
1-18
rdt2.0: Observations
rdt_send(data)
sndpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for
Wait for
call from
ACK or
udt_send(sndpkt)
above
NAK
1. A stop-and-Wait protocol
2. What happens when ACK or
NAK has bit errors?
Approach 1: resend the current
data packet?
rdt_rcv(rcvpkt) && isACK(rcvpkt)
L
sender
Duplicate
packets
1-19
Handling Duplicate Packets
 sender adds sequence number to each packet
 sender retransmits current packet if
ACK/NAK garbled
 receiver discards (doesn’t deliver up)
duplicate packet
1-20
rdt2.1: sender, handles garbled ACK/NAKs
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait
for
Wait for
isNAK(rcvpkt) )
ACK or
call 0 from
udt_send(sndpkt)
NAK 0
above
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt)
L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
udt_send(sndpkt)
L
Wait for
ACK or
NAK 1
Wait for
call 1 from
above
rdt_send(data)
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
1-21
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) &&
has_seq1(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt)
Wait for
0 from
below
Wait for
1 from
below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) &&
has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
1-22
rtd2.1: examples
PKT(0)
PKT(0)
x
ACK
PKT(0)
ACK
ACK
Receiver expects
a pkt with seq. # 1
PKT(1)
Duplicate pkt.
x
NAK
x
PKT(1)
PKT(1)
ACK
PKT(0)
sender
receiver
sender
receiver
1-23
rdt2.1: summary
Sender:
 seq # added to pkt
 two seq. #’s (0,1) will
suffice. Why?
 must check if received
ACK/NAK corrupted
 twice as many states

state must “remember”
whether “current” pkt
has 0 or 1 seq. #
Receiver:
 must check if received
packet is duplicate

state indicates whether
0 or 1 is expected pkt
seq #
 note: receiver can not
know if its last
ACK/NAK received OK
at sender
1-24
rdt2.2: a NAK-free protocol
 same functionality as rdt2.1, using ACKs only
 instead of NAK, receiver sends ACK for last pkt
received OK

receiver must explicitly include seq # of pkt being ACKed
 duplicate ACK at sender results in same action as
NAK: retransmit current pkt
1-25
rdt2.2: sender, receiver fragments
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
Wait for
Wait for
isACK(rcvpkt,1) )
ACK
call 0 from
0
udt_send(sndpkt)
above
sender FSM
fragment
rdt_rcv(rcvpkt) &&
(corrupt(rcvpkt) ||
has_seq1(rcvpkt))
udt_send(sndpkt)
Wait for
0 from
below
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
receiver FSM
fragment
L
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK1, chksum)
udt_send(sndpkt)
1-26
rdt3.0:The case of “Lossy” Channels
 Assumption: underlying channel can also
lose packets (data or ACKs)
 Approach: sender waits “reasonable”
amount of time for ACK (a Time-Out)
Time-out value?
 Possibility of duplicate packets/ACKs?
 if pkt (or ACK) just delayed (not lost):
 retransmission will be duplicate, but use of seq.
#’s already handles this
 receiver must specify seq # of pkt being ACKed

1-27
rdt3.0 sender
rdt_send(data)
sndpkt = make_pkt(0, data, checksum)
udt_send(sndpkt)
start_timer
rdt_rcv(rcvpkt)
L
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,1)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,0) )
timeout
udt_send(sndpkt)
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt)
&& isACK(rcvpkt,0)
stop_timer
stop_timer
timeout
udt_send(sndpkt)
start_timer
L
Wait
for
ACK0
Wait for
call 0from
above
L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) ||
isACK(rcvpkt,1) )
Wait
for
ACK1
Wait for
call 1 from
above
rdt_send(data)
rdt_rcv(rcvpkt)
L
sndpkt = make_pkt(1, data, checksum)
udt_send(sndpkt)
start_timer
1-28
rdt3.0 in action
1-29
rdt3.0 in action
1-30
stop-and-wait operation
sender
receiver
first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R
D
first packet bit arrives
last packet bit arrives, send ACK
ACK arrives, send next
packet, t = D + L / R
Can we do better???
1-31
Pipelining: Motivation
 Stop-and-wait allows the sender to only have a single
unACKed packet at any time
 example: 1 Mbps link (R), end-2-end round trip
propagation delay (D) of 92ms, 1KB packet (L):
Ttransmit =
U


L (packet length in bits)
8kb/pkt
=
R (transmission rate, bps)
10**3 kb/sec
sender
=
L/R
D+L/R
=
8ms
100ms
= 8 ms
= 0.08
microsec
onds
1KB pkt every 100 ms -> 80Kbps throughput on a 1 Mbps link
What does bandwidth x delay product tell us?
1-32
Pipelining: Motivation
1-33
Pipelined protocols
 Pipelining: sender allows multiple, “in-
flight”, yet-to-be-acknowledged pkts
range of sequence numbers must be increased
 buffering at sender and/or receiver

 Two generic forms of pipelined protocols
 go-Back-N
 selective repeat
1-34
Pipelining: increased utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
D
ACK arrives, send next
packet, t = D + L / R
Increase utilization
by a factor of 3!
U
sender
=
3*L/R
D+L/R
=
24ms
100ms
= 0.24
microsecon
ds
1-35
Go-Back-N
 Allow up to N unACKed pkts in the network

N is the Window size
 Sender Operation:
 If window not full, transmit
 ACKs are cumulative
 On timeout, send all packets previously sent but
not yet ACKed.
 Uses a single timer – represents the oldest
transmitted, but not yet ACKed pkt
Why limit the unACKed pkts to N???
1-36
Go-Back-N
1-37
GBN: sender extended FSM
rdt_send(data)
L
base=1
nextseqnum=1
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
else
refuse_data(data)
Wait
rdt_rcv(rcvpkt)
&& corrupt(rcvpkt)
timeout
start_timer
udt_send(sndpkt[base])
udt_send(sndpkt[base+1])
…
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer
1-38
GBN: receiver extended FSM
default
udt_send(sndpkt)
L
Wait
expectedseqnum=1
sndpkt =
make_pkt(expectedseqnum,ACK,chksum)
rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
ACK-only: always send ACK for correctly-received pkt
with highest in-order seq #


may generate duplicate ACKs
need only remember expectedseqnum
 out-of-order pkt:
 discard (don’t buffer) -> no receiver buffering!
 Re-ACK pkt with highest in-order seq #
1-39
GBN in
action
There is a performance issue with GBN?
1-40
Selective Repeat
 receiver individually acknowledges all correctly
received pkts

buffers pkts, as needed, for eventual in-order delivery
to upper layer
 sender only resends pkts for which ACK not
received

sender timer for each unACKed pkt
 sender window
 N consecutive seq #’s
 again limits seq #s of sent, unACKed pkts
1-41
Selective repeat: sender, receiver windows
1-42
Selective repeat
sender
data from above :
receiver
pkt n in [rcvbase, rcvbase+N-1]
 if next available seq # in
 send ACK(n)
timeout(n):
 in-order: deliver (also
window, send pkt
 resend pkt n, restart timer
ACK(n) in [sendbase,sendbase+N1]:
 mark pkt n as received
 if n smallest unACKed pkt,
advance window base to next
unACKed seq #
 out-of-order: buffer
deliver buffered, in-order
pkts), advance window to
next not-yet-received pkt
pkt n in
[rcvbase-N,rcvbase-1]
 ACK(n)
otherwise:
 ignore
1-43
Selective Repeat Example
0123 456789
PKT0
PKT1
PKT2
PKT3
0 1234 56789
01 2345 6789
ACK0 ACK1
0 1234 56789
ACK2
0 1234 56789
ACK3
Time-Out
PKT4
01234 5678 9
PKT1
ACK4
ACK1
01234 5678 9
Sender
Receiver
1-44
Another Example
1-45
Selective repeat:
dilemma
Example:
 seq #’s: 0, 1, 2, 3
 window size=3
 receiver sees no
difference in two
scenarios!
 incorrectly passes
duplicate data as new in
(a)
Q: what is the relationship
between seq # size and
window size?
1-46
Transmission Control Protocol
1-47
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
1-48
Sequence and Acknowledgement
Number
 TCP views data as unstructured, but
ordered stream of bytes.
 Sequence numbers are over bytes, not
segments
 Initial sequence number is chosen randomly
 TCP is full duplex – numbering of data is
independent in each direction
 Acknowledgement number – sequence
number of the next byte expected from
the sender
 ACKs are cumulative
1-49
TCP seq. #’s and ACKs
Seq. #’s:
 byte stream
“number” of first
byte in segment’s
data
ACKs:
 seq # of next byte
expected from
other side
 cumulative ACK
Q: how receiver handles
out-of-order segments
 A: TCP spec doesn’t
say, - up to
implementor
Host A
Host B
1000 byte
data
host ACKs
receipt of
data
Host sends
another
500 bytes
time
1-50
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 Pipelined segments
 Cumulative acks
 TCP uses single
retransmission timer
 Retransmissions are
triggered by:


timeout events
duplicate acks
 Initially consider
simplified TCP sender:


ignore duplicate acks
ignore flow control,
congestion control
1-51
TCP sender events:
data rcvd from app:
 Create segment with
seq #
 seq # is byte-stream
number of first data
byte in segment
 start timer if not
already running (think
of timer as for oldest
unacked segment)
 expiration interval:
TimeOutInterval
timeout:
 retransmit segment
that caused timeout
 restart timer
Ack rcvd:
 If acknowledges
previously unacked
segments


update what is known to
be acked
start timer if there are
outstanding segments
1-52
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event: timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
TCP
sender
(simplified)
Comment:
• SendBase-1: last
cumulatively
ack’ed byte
Example:
• SendBase-1 = 71;
y= 73, so the rcvr
wants 73+ ;
y > SendBase, so
that new data is
acked
1-53
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
1-54
TCP Flow control: how it works
 Rcvr advertises spare
Assumption:
TCP receiver discards out-of order
segments)
 spare room in buffer
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
room by including value
of RcvWindow in
segments
 Sender limits unACKed
data to RcvWindow

guarantees receive
buffer doesn’t overflow
1-55
Principles of Congestion Control
 Congestion: informally: “too many sources
sending too much data too fast for the network to
handle”
 Different from flow control!
 Manifestations:


Packet loss (buffer overflow at routers)
Increased end-to-end delays (queuing in router buffers)
 Congestion results in unfairness and poor
utilization of network resources



Resources used by dropped packets (before they were
lost)
Retransmissions
Poor resource allocation at high load
1-56
Congestion Control: Approaches
 Goal: Throttle senders as needed to ensure
load on the network is “reasonable”
 End-end congestion control:
no explicit feedback from network
 congestion inferred from end-system
observed loss, delay
 approach taken by TCP

 Network-assisted congestion control:
routers provide feedback to end systems
 single bit indicating congestion (e.g., ECN)
 sender should send at explicit rate

1-57
TCP Congestion Control: Overview
 end-end control (no network assistance)
 Limit the number of packets in the network to
window W
 Roughly,
rate =
W
RTT
Bytes/sec
 W is dynamic, function of perceived network
congestion
1-58
TCP Congestion Controls
 Tahoe (Jacobson 1988)
Slow Start
 Congestion Avoidance
 Fast Retransmit

 Reno (Jacobson 1990)
 Fast Recovery
 SACK
 Vegas (Brakmo & Peterson 1994)

Delay and loss as indicators of congestion
1-59
Slow Start
 “Slow Start” is used to





reach the equilibrium
state
Initially: W = 1 (slow start)
On each successful ACK:
WW+1
Exponential growth of W
each RTT: W  2 x W
Enter CA when
W >= ssthresh
ssthresh: window size
after which TCP cautiously
probes for bandwidth
receiver
sender
cwnd
1
2
data
segment
ACK
3
4
5
6
7
8
1-60
Congestion Avoidance
 Starts when
W  ssthresh
 On each successful
ACK:
sender
1
2
receiver
data
segment
ACK
W  W+ 1/W
 Linear growth of W
each RTT:
WW+1
3
4
1-61
CA: Additive Increase,
Multiplicative Decrease
 We have “additive increase” in the absence
of loss events
 After loss event, decrease congestion
window by half – “multiplicative decrease”
ssthresh = W/2
 Enter Slow Start

1-62
Detecting Packet Loss
 Assumption: loss
10
11
indicates congestion
 Option 1: time-out

Waiting for a time-out
can be long!
12
X
13
14
15
16
17
10
11
11
 Option 2: duplicate
ACKs

How many? At least 3.
11
11
Sender
Receiver
1-63
Fast Retransmit
 Wait for a timeout is quite long
 Immediately retransmits after 3
dupACKs without waiting for timeout
 Adjusts ssthresh
ssthresh  W/2
 Enter Slow Start
W=1
1-64
How to Set TCP Timeout Value?
 longer than RTT
 but
RTT varies
 too short: premature timeout
 unnecessary
retransmissions
 too long: slow reaction to segment
loss
1-65
How to Estimate RTT?
 SampleRTT: measured time from segment
transmission until ACK receipt

ignore retransmissions
 SampleRTT will vary, want estimated RTT
“smoother”

average several recent measurements, not just
current SampleRTT
1-66
TCP Round-Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
 EWMA
 influence of past
sample decreases
exponentially fast
 typical value:  =
0.125
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
1-67
TCP Round Trip Time and Timeout
[Jacobson/Karels Algorithm]
Setting the timeout
 EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from
EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)
Then set timeout interval:
TimeoutInterval = µ*EstimatedRTT + Ø*DevRTT
Typically,
µ =1 and Ø = 4.
1-68
TCP Tahoe: Summary
 Basic ideas
Gently probe network for spare capacity
 Drastically reduce rate on congestion
 Windowing: self-clocking
 Other functions: round trip time estimation,
error recovery

for every ACK {
if (W < ssthresh) then W++
else
W += 1/W
(SS)
(CA)
}
for every loss {
ssthresh = W/2
W =1
}
1-69
TCP Tahoe
Window
W2
W1
ssthresh=W2/2
ssthresh=W1/2
Reached initial
ssthresh value;
switch to CA mode
W2/2
W1/2
Time
Slow Start
1-70
Questions?
 Q. 1. To what value is ssthresh initialized
to at the start of the algorithm?
 Q. 2. Why is “Fast Retransmit” triggered
on receiving 3 duplicate ACKs (i.e., why isn’t
it triggered on receiving a single duplicate
ACK)?
 Q. 3. Can we do better than TCP Tahoe?
1-71
TCP Reno
Note how there is “Fast Recovery” after cutting Window in half
Window
Reached initial
ssthresh value;
switch to CA mode
Slow Start
Time
1-72
TCP Reno: Fast Recovery
 Objective: prevent `pipe’ from emptying
after fast retransmit
each dup ACK represents a packet having left
the pipe (successfully received)
 Let’s enter the “FR/FR” mode on 3 dup ACKs

ssthresh  W/2
retransmit lost packet
W  ssthresh + ndup (window inflation)
Wait till W is large enough; transmit new packet(s)
On non-dup ACK (1 RTT later)
W  ssthresh (window deflation)
enter CA mode
1-73
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
1-74
Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control
 congestion control
 instantiation and
implementation in the
Internet
 UDP
 TCP
Next Lecture:
 leaving the network
“edge” (application,
transport layers)
 into the network
“core”
1-75