* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides
Survey
Document related concepts
Transcript
Chapter 3 outline
Transport-layer
services
Principles of reliable
data transfer
Connectionless
transport: UDP
(self –study assignment)
3.5 Connection-oriented
transport: TCP
3.6 Principles of
congestion control
3.7 TCP congestion
control
Summary
Ch3-2
-1
TCP: Overview
RFCs: 793, 1122, 1323, 2018, 2581
Pipelined with send &
pt-to-pt (i.e., no multicast:
receive buffers
one sender, one receiver
full duplex data:
bi-directional data flow in
same connection
MSS: maximum segment
size
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
socket
door
segment
connection-oriented:
handshaking (exchange of
control msgs) init’s
sender & receiver states
before data exchange
reliable, in-order byte
steam:
no “message boundaries”
flow controlled:
sender will not overwhelm
receiver
Congestion control
Sender sets its window size
for congestion control (and
flow control)
Ch3-2
-2
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
represented
by bytes
of data
# bytes
rcvr willing
to accept
application
data
(variable length)
Ch3-2
-3
TCP seq. #’s and ACKs
Seq. #’s:
byte stream
“number” of first
byte in segment’s
data
ACKs:
seq # of next byte
expected from
other side
cumulative ACK
Q: how receiver handles
out-of-order segments
A: TCP spec doesn’t
say, - up to
implementor
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
Ch3-2
time
-4
TCP: retransmission scenarios
Host A
X
loss
Expected (or
NextByteto
Send = 100
Expected (or
NextByteto
Send = 120
Expected (or
NextByteto
Send = 100
time
Expected (or
NextByteto
Send = 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
(Accumulative ACK)
Ch3-2
-5
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
longer than RTT
but RTT varies
too short: premature
timeout
unnecessary
retransmissions
too long: slow reaction
to segment loss
Q: how to estimate RTT?
SampleRTT: measured time from
segment transmission until ACK
receipt
ignore retransmissions
SampleRTT will vary, want
estimated RTT “smoother”
average several recent
measurements, not just
current SampleRTT
Ch3-2
-6
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value: = 0.125
Ch3-2
-7
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
Ch3-2
-8
TCP Round Trip Time and Timeout
Setting the timeout
EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
first estimate of how much SampleRTT deviates from
EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
TimeoutInterval = EstimatedRTT + 4*DevRTT
Ch3-2
-9
Fast Retransmit
Time-out period often
relatively long:
long delay before
resending lost packet
Detect lost segments
via duplicate ACKs.
Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:
fast retransmit: resend
segment before timer
expires
Ch3-2
-10
Host A
seq # x1
seq # x2
seq # x3
seq # x4
seq # x5
Host B
X
ACK x1
ACK x1
ACK x1
ACK x1
timeout
triple
duplicate
ACKs
time
Transport Layer
3-11
Fast retransmit algorithm:
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}
a duplicate ACK for
already ACKed segment
fast retransmit
Ch3-2
-12
TCP Flow Control
receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
speed-matching
app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Ch3-2
-13
TCP Flow control: how it works
Rcvr advertises spare room
by including value of
RcvWindow in segments
Sender limits unACKed data
to RcvWindow
(Suppose TCP receiver discards
out-of-order segments)
spare room in buffer
= RcvWindow ( = rwnd)
= RcvBuffer-[LastByteRcvd LastByteRead]
guarantees receive buffer
doesn’t overflow
The number of unACKed
data will be the smaller of
RcvWindow and
Congestion Window (to
be discussed later)
Ch3-2
-14
TCP Connection Management
Recall: TCP sender, receiver
establish “connection”
before exchanging data
segments
initialize TCP variables:
seq. #s
buffers, flow control
info (e.g. RcvWindow)
client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client host sends TCP
SYN segment to server
specifies initial seq #
no data
Step 2: server host receives
SYN, replies with SYNACK
segment
server allocates buffers
specifies server initial
seq. #
Step 3: client receives SYNACK,
replies with ACK segment,
which may contain data
Ch3-2
-15
TCP Connection Establishment (3-way)
Establishing a connection:
Step 1: client sends TCP SYN
client
server
1
control segment to server
Step 2: server receives SYN,
2
Step 3: clients receives
SYN+ACK, replies with ACK
and possible data
established
replies with SYN and ACK (in
one segment)
Transport Layer
3-16
TCP Connection Close
client closes socket:
clientSocket.close();
client
Step 1: client sends FIN
Step 2: server receives FIN,
server
closing
replies with ACK. Sends FIN.
Waiting to close
closing
Step 3: client receives FIN,
Enters “timed wait” - will
respond with ACK to
received FINs
Step 4: server, receives ACK.
timed wait
replies with ACK.
closed
closed
Connection closed.
Ch3-2
-17
TCP Connection Management (cont)
TCP server
lifecycle
TCP client
lifecycle
Ch3-2
-18
Difficulty with Symmetric Release
Two-army problem: when Blue army #1 can
be sure that Blue army #2 will attack at
the same time? (never)
Ch3-2
-19
Disconnection Request (DR) = Attack
•3-way handshake
usually works
•hosts 1 needs to
retransmit DR
several times
•no perfect
solution, e.g., with
a half-open
connection in
case (d) if the
initial DR and all
subsequent DRs
are lost!
Ch3-2
-20
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
Ch3-2
-21
Causes/costs of congestion: scenario 1
two senders, two
receivers
one router,
infinite buffers
no retransmission
Host A
Host B
lin : original data
lout
•Link capacity C
•unlimited shared
output link buffers
large delays
when congested
maximum
achievable
throughput
Ch3-2
-22
Causes/costs of congestion: scenario 2
one router, finite buffers
sender retransmission of “lost” packet
upon timeout
Host A
l'in : original data, plus
retransmitted data
lout
lin : original
data
Host B
finite shared output
link buffers
Ch3-2
-23
Causes/costs of congestion: scenario 2
Early timeout: every packet retransmitted once (fig a)
“Perfect” retransmission : 1 retarns for 2 pkts (fig b)
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Ch3-2
-24
Causes/costs of congestion: scenario 3
four senders
Q: what happens as l
in
and l increase ?
multihop paths
timeout/retransmit
in
Host A
l'in : original data, plus
retransmitted data
lout
lin : original data
finite shared output
link buffers
Host B
Ch3-2
-25
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
when packet dropped, any “upstream transmission
capacity used for that packet was wasted!
Ch3-2
-26
Approaches towards congestion control
Closed-Loop: admission control prevents congestion
Open-Loop: monitors and deals with congestion
Two broad open-loop approaches:
End-end congestion control:
no explicit feedback from network
congestion inferred from end-system (sender) observed loss,
delay (e.g., TCP timeout)
Network-assisted congestion control:
routers provide feedback to end systems
single bit indicating congestion in ATM and TCP/IP (new )
explicit rate sender should send at
Ch3-2
-27
Network Assisted Congestion Control
•Network feedback via receiver (in ATM and TCP/IP’s ECN bit)
•Direct network (router) feedback: also called choke packets
Ch3-2
-28
TCP Congestion Control
(Tahoe and Reno)
end-end control (no network
How does sender
assistance)
perceive congestion?
sender limits transmission:
loss event = timeout or
LastByteSent-LastByteAcked
3 duplicate acks
min {CongWin, RcvWindow}
TCP sender reduces
Roughly,
rate (CongWin) after
loss event
CongWin
rate =
Bytes/sec
RTT
Main mechanisms:
CongWin is dynamic function
of perceived network
congestion
AIMD (cong avoidance
and fast recovery)
slow start
Fast retransmit
Ch3-2
-29
TCP AIMD
additive increase: increase multiplicative decrease:
CongWin by 1 MSS every
cut CongWin in half
RTT in the absence of
after a 3 Dup ACK in
loss: congestion avoidance
TCP Reno
congestion
window
(not in TCP Tahoe which
24 Kbytes
uses Slow Start)
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
Ch3-2
-30
TCP Slow Start
When connection begins,
CongWin = 1 MSS
Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
available bandwidth may
be >> MSS/RTT
desirable to quickly ramp
up to respectable rate
When connection begins,
increase (ramp-up) rate
exponentially fast until
first loss event
indicated by a triple dup
(TD) ACK or time out
(TO)
Slow at start but grows
fast !
Ch3-2
-31
TCP Slow Start (more)
When connection begins,
double CongWin every
RTT
done by incrementing
CongWin by 1 for every
MSS Acked
Host B
RTT
increase rate exponentially
until first loss event:
Host A
TCP Tahoe (earliest
version): slow start after
either a TO or TD loss
I.e., Fast retransmit too
but no Fast recovery
time
Ch3-2
-32
Fast Retransmit (Reno)
After a TD loss:
cut in 1/2
window then grows
linearly (congestion
avoidance)
But after a TO loss:
CongWin set to 1
MSS (slow start)
window then grows
exponentially
to a new threshold,
then grows linearly (as
in TD)
CongWin
congestion window size
(segments)
14
12
TD
LOSS
10
TCP
Reno
8
6
4
TCP
Tahoe
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
Philosophy:
Series1
Series2
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
Ch3-2
-33
TCP Sender Congestion Control
(see Table 3.3 for TCP Reno)
When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Ch3-2
-34
TCP mechanisms Illustarted
Ch3-2
-35
TCP congestion control FSM: details
duplicate ACK
dupACKcount++
L
cwnd = 1 MSS
ssthresh = 64 KB
dupACKcount = 0
slow
start
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s),as allowed
cwnd > ssthresh
L
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
ssthresh = cwnd/2
cwnd = 1
dupACKcount = 0
retransmit missing segment
new ACK
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
transmit new segment(s),as allowed
.
congestion
avoidance
duplicate ACK
dupACKcount++
New ACK
cwnd = ssthresh
dupACKcount = 0
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
Transport Layer
3-36
Other TCP Variants
Inefficiency in high speed networks:
it takes long time for sender to recovery to original cwnd
after it being halved due to a packet loss.
HSTCP, TCP-Westwood, FAST, Quick-Start, Explicit
Transport Error Notification, eXplicit Control Protocol
(XCP)…
Inefficiency in wireless networks:
current TCP implementations are unable to distinguish
buffer overflow loss (congestion in wired networks) and
random loss (in wireless networks)
Inefficiency in satellite network:
long propagation delay and large RTT indicate low
throughput.
TCP variants for wireless/satellite networks:
TCP-Peach, Indirect-TCP (Split TCP), SNOOP, Explicit
Loss Notification (ELN)..
Ch3-2
-37
TCP throughput
What’s the average throughout ot TCP as a
function of window size and RTT?
Ignore slow start
Let W be the window size when loss occurs.
When window is W, throughput is W/RTT
Just after loss, window drops to W/2,
throughput to W/2RTT.
Average throughout (roughly) .75 W/RTT
Ch3-2
-38
TCP Futures
Example: 1500 byte segments, 100ms RTT, want 10
Gbps throughput
Requires window size W = 83,333 in-flight
segments even if no loss
Throughput in terms of loss rate:
➜ L = 2·10-10 Wow
New versions of TCP for high-speed needed!
Ch3-2
-39
High-Speed TCP (HSTCP)
Like standard TCP when cwnd is small
More aggressive than standard TCP when cwnd is large
Increase window: cwnd = cwnd + a(cwnd) /cwnd for
every segment ACKed
Decrease window: cwnd = (1 – b(cwnd) ) cwnd for
every loss
For standard TCP, a(cwnd)=1 and b(cwnd)=0.5.
HSTCP: e.g., cwnd=83000, b(83000)=0.1means
decreasing 10% after a congestion event,
a(83000)=72 means an increase of 72 segments
upon receiving one ACK.
Ch3-2
-40
HSTCP
Ch3-2
-41
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K, regardless the initial window
size.
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Ch3-2
-42
Why is TCP fair?
Two competing sessions:
Additive increase in throughput from (x,y) with slope of 1
multiplicative decrease in throughput to ({x+i}/2, {y+i}/2)
equal bandwidth share
R
x+i, y+i
(x+i)/2, (y+i)/2
x,y
Connection 1 throughput R
Ch3-2
-43
Fairness (more)
Fairness and UDP
Multimedia apps often
do not use TCP
do not want rate
throttled by congestion
control
Instead use UDP:
pump audio/video at
constant rate, tolerate
packet loss
Research area: make
themTCP friendly
Fairness and parallel TCP
connections
Can open more than one
parallel connections
between 2 hosts
(NetAnts).
Web browsers do this
Example: link of rate R
supporting 9 connections;
new app asks for 1 TCP, gets
rate R/10
new app asks for 9 TCPs,
gets R/2 !
Ch3-2
-44
Delay modeling
Q: How long does it take to
receive an object from a
Web server after sending
a request?
Ignoring congestion, delay is
influenced by:
TCP connection establishment
data transmission delay
slow start
Notation, assumptions:
Assume one link between
client and server of rate R
S: MSS (bits)
O: object/file size (bits)
no retransmissions (no loss,
no corruption)
Window size:
First assume: fixed
congestion window, W
segments
Then dynamic window,
modeling slow start
Ch3-2
-45
Fixed congestion window (1)
First case:
WS/R > RTT + S/R: ACK for
first segment in window
returns before window’s
worth of data sent
delay = 2RTT + O/R
Ch3-2
-46
Fixed congestion window (2)
Second case:
WS/R < RTT + S/R: wait for
ACK after sending window’s
worth of data sent
“gap” between two “rounds”
is S/R+RTT – WS/R
Let K = O / WS be the
number of rounds
There are K-1 gaps
delay = 2RTT + O/R
+ (K-1)[S/R + RTT - WS/R]
Ch3-2
-47
TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start
And P is the number of rounds TCP idles at server (i.e, # of gaps):
Will show that the delay for one object is:
P
O
delay 2 RTT idleTime p
R
p 1
P
O
S
S
2 RTT [ RTT 2 p 1 ]
R
R
p 1 R
This is the window
size during
the p-th round in
Slow-Start
O
S
S
2 RTT P[ RTT ] (2 P 1)
R
R
R
1)Let K be the number of sending rounds/windows for the object.
(calculate K?)
2)If the object data is large , then when the max sending window is large enough,
eventually the idleTime will be 0 (after slow-start ends, as assumed earlier)
3)Let Q be the last round during which idleTime is still > 0
then
Ch3-2
-48
TCP Delay Modeling: Slow Start (2)
Delay components:
• 2 RTT for connection
estab and request
• O/R to transmit
object
• time server idles due
to slow start
initiate TCP
connection
request
object
first window
= S/R
RTT
Server idles:
P = min{K-1,Q} times
second window
= 2S/R
third window
= 4S/R
Example:
• O/S = 15 segments
• K = 4 windows
•Q=2
• P = min{K-1,Q} = 2
Server idles P=2 times
with a decreasing amt
of idleTime
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
Ch3-2
-49
TCP Delay Modeling (3)
2 p 1
S
time to transmit the pth window
R
initiate TCP
connection
request
object
first window
= S/R
RTT
S
p 1 S
R RTT 2 R idle time after the pth window
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
Ch3-2
-50
TCP Delay Modeling (4)
Recall K = number of windows that cover object
How do we calculate K ?
Calculation of Q, number of idles for infinite-size object,
is similar: largest Q s.t.:
Q 1 S
S/R+ RTT >= 2
R
Ch3-2
-51
Effect of Slow Start
Delay is affected by
File/object size O
Transmission rate R
Fixed window size or MSS idle time
RTT
Extra delay due to slow start
With a large O, and small R and RTT
Slow start does not hurt much
With a small O and large RxRTT
Slow start hurts significantly (percentage wise)
Ch3-2
-52
Food For Thought
Assume Web page consists of:
1 base HTML page (of size O bits)
M images (each of size O bits)
Non-persistent HTTP:
M+1 TCP connections in series
Response time = (M+1)O/R + (M+1)2RTT + sum of idle
times
What about Persistent HTTP?
What about Non-persistent HTTP with X parallel
connections? (is this X times fast than having 1 nonpersistent HTTP?)
Ch3-2
-53
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and X=5
20
18
16
14
12
10
8
6
4
2
0
non-persistent
persistent
parallel nonpersistent
28
100
1
10
Kbps Kbps Mbps Mbps
For low bandwidth, connection & response time dominated by
transmission time.
Persistent connections only give minor improvement over parallel
connections.
Ch3-2
-54
HTTP Response time (in seconds)
RTT =1 sec, O = 5 Kbytes, M=10 and X=5
70
60
50
non-persistent
40
persistent
30
20
parallel nonpersistent
10
0
28
100
1
10
Kbps Kbps Mbps Mbps
For larger RTT, response time dominated by TCP establishment
& slow start delays. Persistent connections now give important
improvement: particularly in high delaybandwidth networks.
Ch3-2
-55
Chapter 3: Summary
principles behind transport
layer services:
reliable data transfer
flow control
congestion control
instantiation and
implementation in the
Internet
UDP
TCP
Next:
leaving the network
“edge” (application,
transport layers)
into the network
“core”
Ch3-2
-56