Download TCP - Rudra Dutta

Document related concepts

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Internet protocol suite wikipedia , lookup

IEEE 1355 wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
CSC/ECE 573
Internet Protocols
Transmission Control Protocols
Let’s Design TCP
Or rather, a transport protocol with the following
characteristics



Reliability (no erroneous packets + Guaranteed in-order
delivery)
Flow control
Congestion control
Copyright Rudra Dutta, NCSU, Spring, 2005
2
What Reliable Transport Protocols Provide
Goal
Description
Addressing
Delivery of data from sending application to correct
receiving application
Reliable Delivery
Receiver must get all the data the sender put on the
network
In-Order Delivery
Data segments must be delivered to the receiver in
the same order they left the sender
Flow Control
Sender’s transmission rate should adapt to the
receiver's rate
Congestion Control Transmission rate should not exceed the speed of
slowest link
Segmentation
Copyright Rudra Dutta, NCSU, Spring, 2005
Data should be sent in segments whose size
provides the highest throughput
3
Transport vs. Data-Link Protocols
Both protocols have to deal with error/flow
control, etc.
Data Link Layer
Transport Layer
Communication is over a physical
channel
Communication is over a network
Explicit addressing not always
required (e.g. point-to-point)
Explicit addressing required
Establishing a (single) connection
is simple
Large and variable number of
connections, establishment is
more difficult
A channel cannot reorder frames
The network may store packets
and deliver them later
Copyright Rudra Dutta, NCSU, Spring, 2005
4
Outline
Overview
I.
•
TCP Message Format and Options
II. Connection Establishment &
Termination
III. Data Transfer (Flow and Error Control)
IV. Timer Management
V. Congestion Control
Copyright Rudra Dutta, NCSU, Spring, 2005
5
Transmission Control Protocol
Overview
TCP Overview
Transmission Control Protocol (RFCs 793, 1122,
1323)
Segment: unit of transfer, usually contained in a
single IP datagram
Reliability achieved using



Acknowledgments (therefore, seq. no.s)
Timeouts, retransmissions
Checksums on header and data
Sliding window mechanism for



efficient transmission
flow control
congestion control
Important: several flavors (text covers early one)
Copyright Rudra Dutta, NCSU, Spring, 2005
7
Properties of TCP Reliable Service
Point-to-point
Stream orientation: TCP thinks of data as a stream
of bytes
Unstructured stream: TCP does not honor structured
data
Virtual circuit / connection orientation: state
maintained at both ends
Buffered transfer: the software divides data stream
into segments independent of application program
transfers
Full duplex connection: concurrent data transfer in
both directions (half duplex possible, but then, no
piggy-backing)
Copyright Rudra Dutta, NCSU, Spring, 2005
8
TCP Header Format
Copyright Rudra Dutta, NCSU, Spring, 2005
9
TCP Connections and Endpoints
TCP uses destination port to identify ultimate
destination
TCP uses connection as fundamental
abstraction


connection: a pair of endpoints
endpoint: (host_IP_address, port #) (socket)
TCP ports

a port # does not correspond to a single object
 UDP: single queue per port

a TCP port number can be shared by multiple
connections on the same machine (remote
endpoints differ)
Copyright Rudra Dutta, NCSU, Spring, 2005
10
TCP Header Fields
Sequence number



every byte in data stream is numbered
sequence number = number of the first data byte in the
segment in the sender's byte stream
wraps around after 232 -1
ACK number



next sequence number the sender of the acknowledgment
expects to receive
i.e., sequence number + 1 of last successfully received
consecutive data byte
valid only when ACK flag = 1
Header length


length of header (in bytes) / 4
With maximum value of 15 (24-1), header cannot exceed 60
bytes
Copyright Rudra Dutta, NCSU, Spring, 2005
11
TCP Header Fields
Window size



number of bytes (starting with the one specified in the ACK
field) that receiver is willing to accept
16 bits long; max value is 65535
used for flow control
Checksum


similar to UDP computation (including pseudo-header)
but use is mandatory
Urgent pointer


sequence number of last byte of urgent data (later)
added to sequence number of segment
Options

most common option is maximum segment size (MSS)
Copyright Rudra Dutta, NCSU, Spring, 2005
12
Description of Flags in Control Field
Flag
Description
URG
The value of the urgent pointer is valid
ACK
The value of the acknowledgment number is
valid
PSH
Push the data (pass data to receiver as quickly
as possible, later)
The connection must be reset
RST
SYN
Synchronize the sequence numbers during
connection establishment
FIN
The sender has no more data to transmit
Copyright Rudra Dutta, NCSU, Spring, 2005
13
TCP Options
Copyright Rudra Dutta, NCSU, Spring, 2005
14
End of Options
Used for padding at end of the option field
Only one end-of-option used --> after this
option receiver looks for payload data
Copyright Rudra Dutta, NCSU, Spring, 2005
15
No Operation Option
If multiple bytes needed for alignment, no-op
option is used
Copyright Rudra Dutta, NCSU, Spring, 2005
16
Maximum Segment Size Option
Declared during connection
establishment phase (in SYN segments)

cannot be specified during data transfer
Defines the maximum segment size of
data the sender is willing to accept
MSS must be  interface MTU - 40
bytes

default: 536
Copyright Rudra Dutta, NCSU, Spring, 2005
17
Long Fat Pipes (LFN)
Bandwidth-delay product (capacity of "pipe")


Bandwidth (b/s) X Round-trip-time (RTT, in s)
Note: consider transmission and propagation
delays
Example: trans-continental T3


RTT = 6000 Km * 5 s / Km = 30 ms
B-D Product = 45 Mbps X 30 ms = 170 KB
TCP cannot handle “Long Fat Pipes”
efficiently



small window size (16 bits  maximum of 64 KB)
small sequence number space
(go-back-N ARQ protocol)
Copyright Rudra Dutta, NCSU, Spring, 2005
18
Window Scale Option
New window size = window size in
header X 2scale-factor
equivalent to “shift window size in header
left by a number of bits equal to the
window scale factor”
 only carried in segments with SYN flag = 1
(permanent for the connection)
 Limit on value (look it up)

Copyright Rudra Dutta, NCSU, Spring, 2005
19
Timestamp Option
Sender puts timestamp option in segment, reflected
by receiver in the acknowledgment



Can compute RTT for each received ACK
allows sender to compute RTT more accurately
clock synchronization between two hosts is not required
Without timestamps, RTT calculated once every
window


OK for 8-segment windows
but larger windows require better RTT calculations
Copyright Rudra Dutta, NCSU, Spring, 2005
20
Transmission Control Protocol
Connection Establishment and
Termination
Connection Establishment Issues
Naive approach:
Connection Request message
 Connection Accepted message
 Sequence numbers always start at 0

Problem: delayed duplicates
Copyright Rudra Dutta, NCSU, Spring, 2005
22
Connection Establishment Issues (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
23
Connection Establishment Issues (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
24
Connection Establishment Solution
Limit packet lifetime

T = Max Segment Lifetime (compare with TTL)
Use long sequence numbers


wrap around time > 2T
= time for a packet and its ACKs to “die”
Choose a different initial sequence number
(ISN) with each connection request
Ignore additional requests for connection
after establishment
After host crash: wait for time 2T

Allow time for old packets to die off
Copyright Rudra Dutta, NCSU, Spring, 2005
25
Three-Way Handshake
Copyright Rudra Dutta, NCSU, Spring, 2005
26
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
27
Three-Way Handshake (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
28
Simultaneous Open
A connects with B, B connects with A at same time (pass
each other in the network)


Only one connection will be established!
Using only 2 ports (one on A, one on B)
Copyright Rudra Dutta, NCSU, Spring, 2005
29
Connection Termination Issues
Asymmetric (unilateral) release

abrupt, data may be lost
Symmetric (bilateral) release


each direction released independently of the other
each side can close its transmission

“half closed” state is possible
To avoid data loss in a symmetric release…


no side should disconnect until it is convinced that
the other side is also prepared to disconnect
same as “two-army” problem: no solution!
Copyright Rudra Dutta, NCSU, Spring, 2005
30
TCP Connection Termination
1. A sender closes its part of the
connection by sending a FIN segment
2. After ACKing the FIN, the receiver can
still send data on its part of the
connection (half-close)
3. Finally, the receiver closes its part of
the connection by sending a FIN
segment (graceful close)
Copyright Rudra Dutta, NCSU, Spring, 2005
31
TCP Connection Termination (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
32
Simultaneous Close
A sends FIN to B, B sends FIN to A at
same time (pass each other in the
network)
FIN_WAIT_1
FIN_WAIT_1
CLOSING
CLOSING
TIME_WAIT
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
A
B
33
Connection Reset
A connection can also be aborted with a
RST segment (hard reset)

normally reserved for error conditions, not
normal termination
Copyright Rudra Dutta, NCSU, Spring, 2005
34
Half Closed Connection
One end of connection (e.g.
clientserver) terminates (sends FIN
and receives ACK of FIN)
Other end (serverclient) remains open
(sending data)
Other end (server) later terminates
(sends FIN and receives ACK of FIN),
and connection is then completely
closed
Copyright Rudra Dutta, NCSU, Spring, 2005
35
TCP Connection States
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
36
TCP: Normal Client Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
37
TCP: Simultaneous Open
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
38
TCP: Simultaneous Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
39
TCP: Normal Server Open and Close
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
40
TCP: Client Resets Connection
Before Establishment Complete
CLOSED
LISTEN
SYN_RCVD
SYN_SENT
ESTABLISHED
FIN_WAIT_1
CLOSING
FIN_WAIT_2
TIME_WAIT
Copyright Rudra Dutta, NCSU, Spring, 2005
CLOSE_WAIT
LAST_ACK
41
TCP State
Machine
Source:
W. Richard Stevens
Copyright Rudra Dutta, NCSU, Spring, 2005
42
Transmission Control Protocol
Data Transfer
Data Transfer in TCP
Data received from an application usually sent in
segments of size MSS
ACKs carry the sequence number of the next byte
receiver expects to receive

this is a cumulative count
Unacknowledged data = data sent by sender, but not
yet acknowledged by the receiver

i.e., packets not yet received, or acknowledgment not yet
received
Each ACK carries a window advertisement


window = how many additional bytes the receiver is
prepared to accept before the sender must wait for an
acknowledgment
size of window specified in bytes, not segments
Copyright Rudra Dutta, NCSU, Spring, 2005
44
Sliding Windows
TCP sliding window mechanism allows
multiple segments to be sent before an
ACK is returned
Left boundary of window = earliest
unacknowledged byte

an acknowledgment advances this left
boundary
Right boundary of window = latest
unacknowledged byte

a window advertisement advances this
right boundary
Copyright Rudra Dutta, NCSU, Spring, 2005
45
Sliding Window
Copyright Rudra Dutta, NCSU, Spring, 2005
46
Dynamic Sliding Window
5
Copyright Rudra Dutta, NCSU, Spring, 2005
47
TCP Window Management Example
Copyright Rudra Dutta, NCSU, Spring, 2005
48
Pushing Data + Urgent Data
Buffering by TCP sender improves efficiency &
throughput

not appropriate for all applications (why not?)
For interactive applications, can use push operation


forces TCP to send data without waiting for the buffer to fill
sets the PSH flag: forces delivery to application at receiving
end
There is no “interrupt” message to send out-of-band
information (e.g., interrupt, abort signals)


instead, set the URG flag
urgent pointer points to the urgent data in the segment
Copyright Rudra Dutta, NCSU, Spring, 2005
49
Performance Issues
Small packets, and small window
advertisements, create efficiency
problems
These can solved by

delaying sending of data


sender “voluntarily” consolidates multiple small
packets into a single larger packet
delaying sending of ACKs/window
advertisements

sender “strongly encouraged” to consolidate
multiple small packets into a single larger
packet
Copyright Rudra Dutta, NCSU, Spring, 2005
50
“Silly Window” () Syndrome
A serious problem in sliding window
operation
Causes
1.
2.
sending application program creates data slowly
receiving application program consumes data
slowly
In either case, data may be sent in small
segments


inefficient use of bandwidth
increased processing by TCP
TCP now requires avoidance heuristics
Copyright Rudra Dutta, NCSU, Spring, 2005
51
SWS Caused by Receiver
Copyright Rudra Dutta, NCSU, Spring, 2005
52
SWS Caused by Sender
Example: use of Telnet, running
interactive editor at a remote host

potentially, two 41-byte and two 40-byte
datagrams sent for each character typed
 Sender: send of 1 byte (1+40 bytes)
 Receiver: acknowledgment of 1 byte (40
bytes)
 Receiver: echo of 1 byte (1+40 bytes)
 Sender: acknowledgment of echo (40
bytes)
Copyright Rudra Dutta, NCSU, Spring, 2005
53
Sender Side Heuristic
Nagle's algorithm

when application generates data 1 byte at a time,
send the first byte and buffer all the rest until…



an ACK is received, or
a maximum sized segment is filled, or
half the window has filled (sent but not yet ack’ed)
The rule is applied even if Push has been
requested
Features



a self-clocking algorithm (does not compute
delays)
little overhead: no timers, clock readings, etc.
does not affect throughput in “normal” use (i.e.,
when there is lots of data to transmit)
Copyright Rudra Dutta, NCSU, Spring, 2005
54
Yet Another Override
Nagle’s algorithm is widely used in TCP
implementations, but may have to be
turned off in some cases
ex.: mouse movements in an X-windows
application running over the Internet
 Whenever delay due to “consolidation” is
unacceptable

Copyright Rudra Dutta, NCSU, Spring, 2005
55
Receiver Side Heuristics
Clark's solution


do not send window advertisements for 1 byte
instead, advertise a window size of zero and wait
until



there is space for a maximum segment size of data, or
the receive buffer is half-empty
whichever occurs earlier
Nagle's algorithm and Clark's solution are
complementary and can be used together
Problems? Solutions?


Consider the possibility of loss
See persist timer
Copyright Rudra Dutta, NCSU, Spring, 2005
56
Receiver Side Heuristics
Delay window advertisements by delaying ACKs



Until avoidance algorithm allows window advertisement
May result in delaying ACK until there is data to send, so
can “piggyback” ACK with data
May result in single ACK for multiple segments
Reduces traffic




but may force sender to timeout and retransmit a
segment (incorrectly)
ACKs may not be delayed for more than 500ms
RTT estimation may suffer
ACK at least every other segment
Copyright Rudra Dutta, NCSU, Spring, 2005
57
TCP Error Control
1. Error detection



checksum: to check for corrupted
segments at destination
ACK: to confirm receipt of segment by
destination
time-out: one retransmission timer for
each segment sent
2. Error correction – no FEC

source retransmits segments for which
retransmission timer expired
Copyright Rudra Dutta, NCSU, Spring, 2005
58
Responses to Errors
Fundamentally GoBackN

Different versions of TCP have different
detail behavior
Corrupted or lost segments
will not be acknowledged by receiver
 will be retransmitted by sender

Duplicate segments

ignored by receiver
Copyright Rudra Dutta, NCSU, Spring, 2005
59
Responses to Errors (cont’d)
Out-of-order segments (receive segment
n+1 before segment n)
buffered by receiver to improve efficiency
 not ACKed
 send a “duplicate” ACK for the last
acknowledged segment

Lost ACKs
may lead to retransmission of data
 most likely will not be noticed (another, later
ACK also indicates segment was received)

Combination of GoBackN, NACK, SRP
Copyright Rudra Dutta, NCSU, Spring, 2005
60
Transmission Control Protocol
Timers and Congestion Control
Types of TCP Timers
1. Retransmission
2. Persistence
3. Keep-Alive
4. Time-Waited
Copyright Rudra Dutta, NCSU, Spring, 2005
62
1. Retransmission Timer
A. When a segment is sent, a
retransmission timer is started
B. If segment is ACKed before timer
expires, the timer is stopped
C. If timer expires before ACK arrives, the
segment is retransmitted and the timer
is started again
Copyright Rudra Dutta, NCSU, Spring, 2005
63
Retransmission Timer(cont’d)
How long should timeout interval be?


too short: retransmitting when don’t need to!
too long: waiting too long to retransmit when
needed!
Terminology



RTTsample = time measured for round trip of one
packet
RTT = estimate of current round trip time
RTO = retransmission timeout interval
Issues


obtain accurate estimate of round-trip time
choose a good value for retransmission timeout
interval
Copyright Rudra Dutta, NCSU, Spring, 2005
64
Measurement of Round-Trip Time
Basic RTT estimation

explicitly measure the interval between the time a
segment is transmitted and the time the ACK for
the segment returns
may use TCP timestamp option
RTT =  * RTT + (1-)* RTTsample

Recommended value for  = .9


Karn's improvement
 do not update RTT for any segments that have

been retransmitted
avoids ambiguity about which timer value to use
Copyright Rudra Dutta, NCSU, Spring, 2005
65
Determining RTO
Basic (original) method

RTO = 2 * RTT (i.e. allow two roundtrip
times before timing out)
Problem

When RTT has a high standard deviation,
this method is too conservative (times out
too quickly)
Copyright Rudra Dutta, NCSU, Spring, 2005
66
Determining RTO (cont’d)
Solution



estimate both mean and standard deviation of RTT
set RTO = to RTT + a multiple of the standard deviation
use mean deviation (MDEV)as a cheap estimate of standard
deviation
Algorithm
RTT = RTT +  *(RTTsample – RTT)
MDEV =  * MDEV + (1-) * RTTsample - RTT
RTO = RTT + 4 * MDEV
( = .75,  = .125)
Comments


99% of packets arrive within 4 standard deviations
The approximations above are extremely efficient computationally
Copyright Rudra Dutta, NCSU, Spring, 2005
67
TCP Coarse-Grain Timer
TCP uses a 500-ms clock for time measurements

values obtained may differ up to  500 ms from actual value`
Copyright Rudra Dutta, NCSU, Spring, 2005
68
Retransmission Timer Backoff
Not updating the RTT for lost / retransmitted
segments may lead to trouble
Retransmission may have occurred because
RTO is too low


E.g. maybe network suddenly became overloaded
should therefore increase RTO
Exponential backoff



upon timeout, set RTO = 2 * RTO
continue doubling of RTO for each retransmission
when an ACK arrives for a segment that did not
require retransmission, update RTT and RTO
according to the “normal” computation
Copyright Rudra Dutta, NCSU, Spring, 2005
69
2. Persist Timer
It is possible for the advertised window
to go to 0

i.e., sender is transmitting faster than
receiver can receive
The receiver can request more data by
sending an ACK advertising the size of
the available receive buffer
If this ACK is lost  deadlock!
Copyright Rudra Dutta, NCSU, Spring, 2005
70
2. Persist Timer (cont’d)
After receiving any ACK with
window_size = 0, start a persist timer
Upon expiration of the persist timer, send a
probe packet
 Probe packet = 1 byte of data

The response (ACK) to the probe gives
the window size

recovers from lost window advertisements
by “stimulating” new advertisements
Copyright Rudra Dutta, NCSU, Spring, 2005
71
Persist Timer (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
72
3. Keepalive Timer
It is possible for TCP connections to be
idle for a long time
e.g., client opens a connection to a server,
but never makes a request
 consumes resources (memory) on the
server

Keepalive timer maintained by some
implementations
not part of TCP standard
 somewhat controversial

Copyright Rudra Dutta, NCSU, Spring, 2005
73
3. Keepalive Timer (cont’d)
The timer is reset every time a segment
is received

timer interval = 2 hours
When timer expires, check if other side
is still there
repeat 10 times at 75 second intervals
 if there is no response, the connection is
terminated

Copyright Rudra Dutta, NCSU, Spring, 2005
74
4. Time-Wait Timer
After agreeing with the other side to close a
connection, TCP enters the TIME_WAIT
state



starts a timer that runs for twice the maximum
packet lifetime
the connection is held in limbo during this period
the connection is closed after the timer expires
This ensures that all packets (and their ACKs)
created during the connection have “died off”
(reached their destination or been discarded)
Copyright Rudra Dutta, NCSU, Spring, 2005
75
Flow vs. Congestion Control
Copyright Rudra Dutta, NCSU, Spring, 2005
76
Congestion Control: “Open Loop” Approach
1. Source describes the traffic it will be
sending through the network

problem: choosing right set of parameters
to describe a source
2. The network reserves resources
(bandwidth, buffer space) during
connection establishment

resources will be held by the connection
for the duration
3. Source “shapes” its traffic to match its
description
Copyright Rudra Dutta, NCSU, Spring, 2005
77
Open Loop (cont’d)
Results


network overload / congestion will be
avoided
connection performance is guaranteed
Significant amount of work in this area
 ATM standards
Copyright Rudra Dutta, NCSU, Spring, 2005
78
Congestion Control: “Closed Loop” Approach
1. Source establishes connection, no
traffic description required
2. Feedback about congestion is
continually provided
3. Source dynamically adapts its traffic
characteristics to reduce congestion or
use available bandwidth


source must decide how to best react to
current network state
not policed or enforced by the network!
Copyright Rudra Dutta, NCSU, Spring, 2005
79
“Closed Loop” Approach (cont’d)
Results


congestion is detected and then reduced
no guarantees of connection performance
How does the source learn that its service
rate has changed? Choices:
1.
explicit feedback: the network conveys this info to
the source

2.
example: ICMP source quench
implicit feedback: source infers a change in its
service rate by measuring its current end-to-end
performance

example: TCP acknowledgments
Copyright Rudra Dutta, NCSU, Spring, 2005
80
TCP Congestion Control
1. Uses a closed-loop approach
2. Feedback is implicit (ACKs)
3. Adapts rate based on dynamic window

smaller window = lower rate, larger
window = higher rate
4. Control mechanism is end-to-end

no connection state maintained by the
network!
Copyright Rudra Dutta, NCSU, Spring, 2005
81
TCP Implicit Feedback
TCP relies on feedback from the
network to detect congestion
timeout caused by a lost packet (used by
TCP-Tahoe, TCP-Reno)
 duplicate ACK (used by TCP-Reno)

Assumption: packet loss is due to
congested routers, not transmission
errors

Incorrect assumption for wireless networks!
Copyright Rudra Dutta, NCSU, Spring, 2005
82
TCP Dynamic Windows
Window #1: advertised by receiver

purpose: avoid overrunning a slow receiver
(i.e., for flow control)
Window #2: maintained by sender
purpose: avoid network overload (i.e. for
congestion control)
 Called the congestion window, or cwnd


the sending TCP dynamically manipulates
cwnd
“The” window size =
MIN(receiver_advertisement, cwnd)
Copyright Rudra Dutta, NCSU, Spring, 2005
83
TCP Congestion Control Overview
1. Probe for available bandwidth by increasing
cwnd


“slow” start = exponential increase phase
congestion avoidance = linear increase phase
2. Slow start threshold (= ssthresh)

controls transition from exponential to linear
phase
3. Upon packet loss (timeout, duplicate
acknowledgment)…


retransmit the packet
reduce the window size
TCP Reno adds (1) fast retransmit, and (2)
fast recovery
Copyright Rudra Dutta, NCSU, Spring, 2005
84
Slow Start
At connection establishment
cwnd  1, ssthresh  1/2 initial receiver
window advertisement
 (both cwnd and ssthresh are in units of
MSS in the following)

Upon arrival of each ACK: cwnd 
cwnd + 1
Not slow; exponential increase!

window increases by cwnd every RTT

can quickly congest the network and cause
packet loss
Copyright Rudra Dutta, NCSU, Spring, 2005
85
Slow Start (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
86
Congestion Avoidance
Basic idea: avoid increasing the window size
too quickly

slow down when approaching the “steady state”
Use slow start as long as cwnd 
ssthresh, enter congestion avoidance
phase when cwnd > ssthresh
Upon arrival of each ACK: cwnd  cwnd +
1/cwnd
Linear increase

window increases by 1 every RTT
Copyright Rudra Dutta, NCSU, Spring, 2005
87
Response to Congestion (TCP Tahoe/Reno)
When congestion occurs



Packet deliver may be slowed, or fail
Acknowledgment delivery may be slowed, or fail
In either case, timeout / retransmission may occur
transmission_window =
MIN(receiver_advertisement,cwnd)

ssthresh  MAX(2, ½* transmission_window)
When timeout occurs…

cwnd  1 (enter slow start phase, from the top)
Copyright Rudra Dutta, NCSU, Spring, 2005
88
Evolution of TCP's Congestion Window
Copyright Rudra Dutta, NCSU, Spring, 2005
89
“Slow Start”, “Exponential Decay”
Consider only steady state conditions
Increase of transmission window
Congestion avoidance in the steady state
 Not slow start
 Window increases by 1 each RTT
 Additive increase

Decrease of transmission window
Only congestion, in steady state
 Successive timeouts
 Window halved in each RTT
 Multiplicative decrease

Copyright Rudra Dutta, NCSU, Spring, 2005
90
How Good is This?
On the whole a simple strategy

Logic not obvious
Consider the fairness issues
Two TCP connections over a bottleneck link


Ideally, should share bandwidth
But start times and windows could be very
different
Copyright Rudra Dutta, NCSU, Spring, 2005
91
How Good is This?
Together, the connections cannot get more than
bottleneck bandwidth

And should get that much
We would like them to be fair and share equally
Consider a situation when this is not true
Successive adjustments tend to stabilize
Copyright Rudra Dutta, NCSU, Spring, 2005
92
Fast Retransmission (TCP Reno)
When out-of-order segment arrives



receiver issues ACK with the same cumulative sequence
number as the previous ACK
called a duplicate ACK (acts like a NAK)
Indicates data is still flowing between sender and receiver
Receipt of 3 duplicate ACKs in a row for segment X is
a strong indication that segment X has been lost
Sender actions

retransmit the (presumably) lost segment without waiting for
its timer to expire, and no delay
ssthresh  ½ * transmission_window
But cwnd is not set to 1

called fast retransmission


Copyright Rudra Dutta, NCSU, Spring, 2005
93
Fast Recovery (TCP Reno)
Receipt of duplicate ACKs implies data is still
flowing: do not enter slow start yet
After a fast retransmission

cwnd  ssthresh + 3

fast recovery, enter linear phase
Each time another duplicate ACK arrives

cwnd  cwnd + 1
When ACK for a retransmitted segment
arrives

cwnd  ssthresh

Back to normal
Copyright Rudra Dutta, NCSU, Spring, 2005
94
Congestion Example
Source/receiver have buffers of 8192
bytes
MSS = 1024
Copyright Rudra Dutta, NCSU, Spring, 2005
95
Congestion
Example:
Slow Start
Copyright Rudra Dutta, NCSU, Spring, 2005
96
Congestion
Example:
Lost Segment
Copyright Rudra Dutta, NCSU, Spring, 2005
97
Congestion
Example: Fast
Retransmission
Copyright Rudra Dutta, NCSU, Spring, 2005
98
TCP and Transactions (Request / Reply)
Many TCP transactions consist simply
of a request to a server and a reply to
the client
Problems
these simple transactions may require the
exchange of 10 segments (why?)
 a simple transaction also takes at least
two RTT times plus the processing time at
the server

Copyright Rudra Dutta, NCSU, Spring, 2005
99
TCP Extension for Transactions (cont'd)
To distinguish between consecutive transactions, a
connection count (CC) TCP option is added to the
header
The server maintains a per-host cache of the most
recent




CC
MSS
window size
RTO
The client combines the following into one segment




SYN
Request
FIN
also includes the current CC value
Copyright Rudra Dutta, NCSU, Spring, 2005
100
TCP Extension for Transactions (cont'd)
If the received CC > the cached CC, the
server combines the following into a single
segment




SYN
ACK of the client’s SYN
reply
FIN
Else, the normal 3-way handshake is
performed
Client ACKs the server's (SYN + FIN + data)
in one segment
Time needed is the minimum

1 RTT + processing time at server
Copyright Rudra Dutta, NCSU, Spring, 2005
101
TCP Extension for Transactions (cont'd)
Copyright Rudra Dutta, NCSU, Spring, 2005
102