Download Slides

Document related concepts

RS-232 wikipedia , lookup

Remote Desktop Services wikipedia , lookup

Lag wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

UniPro protocol stack wikipedia , lookup

IEEE 1355 wikipedia , lookup

Internet protocol suite wikipedia , lookup

TCP congestion control wikipedia , lookup

Transcript
Chapter 3 outline
 Transport-layer
services
 Principles of reliable
data transfer
 Connectionless
transport: UDP
(self –study assignment)
 3.5 Connection-oriented
transport: TCP
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
 Summary
Ch3-2
-1
TCP: Overview
RFCs: 793, 1122, 1323, 2018, 2581
 Pipelined with send &
 pt-to-pt (i.e., no multicast:

receive buffers
one sender, one receiver
 full duplex data:


bi-directional data flow in
same connection
MSS: maximum segment
size
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
socket
door
segment
 connection-oriented:

handshaking (exchange of
control msgs) init’s
sender & receiver states
before data exchange
 reliable, in-order byte
steam:

no “message boundaries”
 flow controlled:

sender will not overwhelm
receiver
 Congestion control

Sender sets its window size
for congestion control (and
flow control)
Ch3-2
-2
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
represented
by bytes
of data
# bytes
rcvr willing
to accept
application
data
(variable length)
Ch3-2
-3
TCP seq. #’s and ACKs
Seq. #’s:
 byte stream
“number” of first
byte in segment’s
data
ACKs:
 seq # of next byte
expected from
other side
 cumulative ACK
Q: how receiver handles
out-of-order segments
 A: TCP spec doesn’t
say, - up to
implementor
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
Ch3-2
time
-4
TCP: retransmission scenarios
Host A
X
loss
Expected (or
NextByteto
Send = 100
Expected (or
NextByteto
Send = 120
Expected (or
NextByteto
Send = 100
time
Expected (or
NextByteto
Send = 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
(Accumulative ACK)
Ch3-2
-5
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
 longer than RTT

but RTT varies
 too short: premature
timeout
 unnecessary
retransmissions
 too long: slow reaction
to segment loss
Q: how to estimate RTT?
 SampleRTT: measured time from
segment transmission until ACK
receipt
 ignore retransmissions
 SampleRTT will vary, want
estimated RTT “smoother”
 average several recent
measurements, not just
current SampleRTT
Ch3-2
-6
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
 Exponential weighted moving average
 influence of past sample decreases exponentially fast
 typical value:  = 0.125
Ch3-2
-7
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
Ch3-2
-8
TCP Round Trip Time and Timeout
Setting the timeout
 EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from
EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)
Then set timeout interval:
TimeoutInterval = EstimatedRTT + 4*DevRTT
Ch3-2
-9
Fast Retransmit
 Time-out period often
relatively long:

long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.


Sender often sends
many segments back-toback
If segment is lost,
there will likely be many
duplicate ACKs.
 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend
segment before timer
expires
Ch3-2
-10
Host A
seq # x1
seq # x2
seq # x3
seq # x4
seq # x5
Host B
X
ACK x1
ACK x1
ACK x1
ACK x1
timeout
triple
duplicate
ACKs
time
Transport Layer
3-11
Fast retransmit algorithm:
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}
a duplicate ACK for
already ACKed segment
fast retransmit
Ch3-2
-12
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Ch3-2
-13
TCP Flow control: how it works
 Rcvr advertises spare room
by including value of
RcvWindow in segments
 Sender limits unACKed data
to RcvWindow
(Suppose TCP receiver discards
out-of-order segments)
 spare room in buffer
= RcvWindow ( = rwnd)
= RcvBuffer-[LastByteRcvd LastByteRead]

guarantees receive buffer
doesn’t overflow
 The number of unACKed
data will be the smaller of
RcvWindow and
Congestion Window (to
be discussed later)
Ch3-2
-14
TCP Connection Management
Recall: TCP sender, receiver
establish “connection”
before exchanging data
segments
 initialize TCP variables:
 seq. #s
 buffers, flow control
info (e.g. RcvWindow)
 client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
 server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client host sends TCP
SYN segment to server
 specifies initial seq #
 no data
Step 2: server host receives
SYN, replies with SYNACK
segment
server allocates buffers
 specifies server initial
seq. #
Step 3: client receives SYNACK,
replies with ACK segment,
which may contain data

Ch3-2
-15
TCP Connection Establishment (3-way)
Establishing a connection:
Step 1: client sends TCP SYN
client
server
1
control segment to server
Step 2: server receives SYN,
2
Step 3: clients receives
SYN+ACK, replies with ACK
and possible data
established
replies with SYN and ACK (in
one segment)
Transport Layer
3-16
TCP Connection Close
client closes socket:
clientSocket.close();
client
Step 1: client sends FIN
Step 2: server receives FIN,
server
closing
replies with ACK. Sends FIN.
Waiting to close
closing
Step 3: client receives FIN,

Enters “timed wait” - will
respond with ACK to
received FINs
Step 4: server, receives ACK.
timed wait
replies with ACK.
closed
closed
Connection closed.
Ch3-2
-17
TCP Connection Management (cont)
TCP server
lifecycle
TCP client
lifecycle
Ch3-2
-18
Difficulty with Symmetric Release
 Two-army problem: when Blue army #1 can
be sure that Blue army #2 will attack at
the same time? (never)
Ch3-2
-19
Disconnection Request (DR) = Attack
•3-way handshake
usually works
•hosts 1 needs to
retransmit DR
several times
•no perfect
solution, e.g., with
a half-open
connection in
case (d) if the
initial DR and all
subsequent DRs
are lost!
Ch3-2
-20
Principles of Congestion Control
Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!
Ch3-2
-21
Causes/costs of congestion: scenario 1
 two senders, two
receivers
 one router,
infinite buffers
 no retransmission
Host A
Host B
lin : original data
lout
•Link capacity C
•unlimited shared
output link buffers
 large delays
when congested
 maximum
achievable
throughput
Ch3-2
-22
Causes/costs of congestion: scenario 2
 one router, finite buffers
 sender retransmission of “lost” packet
upon timeout
Host A
l'in : original data, plus
retransmitted data
lout
lin : original
data
Host B
finite shared output
link buffers
Ch3-2
-23
Causes/costs of congestion: scenario 2
 Early timeout: every packet retransmitted once (fig a)
 “Perfect” retransmission : 1 retarns for 2 pkts (fig b)
“costs” of congestion:
 more work (retrans) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
Ch3-2
-24
Causes/costs of congestion: scenario 3
 four senders
Q: what happens as l
in
and l increase ?
 multihop paths
 timeout/retransmit
in
Host A
l'in : original data, plus
retransmitted data
lout
lin : original data
finite shared output
link buffers
Host B
Ch3-2
-25
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
 when packet dropped, any “upstream transmission
capacity used for that packet was wasted!
Ch3-2
-26
Approaches towards congestion control
 Closed-Loop: admission control prevents congestion
 Open-Loop: monitors and deals with congestion
Two broad open-loop approaches:
End-end congestion control:
 no explicit feedback from network
 congestion inferred from end-system (sender) observed loss,
delay (e.g., TCP timeout)
Network-assisted congestion control:
 routers provide feedback to end systems


single bit indicating congestion in ATM and TCP/IP (new )
explicit rate sender should send at
Ch3-2
-27
Network Assisted Congestion Control
•Network feedback via receiver (in ATM and TCP/IP’s ECN bit)
•Direct network (router) feedback: also called choke packets
Ch3-2
-28
TCP Congestion Control
(Tahoe and Reno)
 end-end control (no network
How does sender
assistance)
perceive congestion?
 sender limits transmission:
 loss event = timeout or
LastByteSent-LastByteAcked
3 duplicate acks
 min {CongWin, RcvWindow}
 TCP sender reduces
 Roughly,
rate (CongWin) after
loss event
CongWin
rate =
Bytes/sec
RTT
Main mechanisms:
 CongWin is dynamic function
of perceived network
congestion



AIMD (cong avoidance
and fast recovery)
slow start
Fast retransmit
Ch3-2
-29
TCP AIMD
additive increase: increase multiplicative decrease:
CongWin by 1 MSS every
cut CongWin in half
RTT in the absence of
after a 3 Dup ACK in
loss: congestion avoidance
TCP Reno
congestion
window
(not in TCP Tahoe which
24 Kbytes
uses Slow Start)
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
Ch3-2
-30
TCP Slow Start
 When connection begins,
CongWin = 1 MSS


Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
 available bandwidth may
be >> MSS/RTT

desirable to quickly ramp
up to respectable rate
 When connection begins,
increase (ramp-up) rate
exponentially fast until
first loss event
indicated by a triple dup
(TD) ACK or time out
(TO)
 Slow at start but grows
fast !
Ch3-2
-31
TCP Slow Start (more)
 When connection begins,


double CongWin every
RTT
done by incrementing
CongWin by 1 for every
MSS Acked
Host B
RTT
increase rate exponentially
until first loss event:
Host A
 TCP Tahoe (earliest
version): slow start after
either a TO or TD loss

I.e., Fast retransmit too
but no Fast recovery
time
Ch3-2
-32
Fast Retransmit (Reno)
 After a TD loss:
cut in 1/2
 window then grows
linearly (congestion
avoidance)
 But after a TO loss:
 CongWin set to 1
MSS (slow start)
 window then grows
exponentially
 to a new threshold,
then grows linearly (as
in TD)
 CongWin
congestion window size
(segments)
14
12
TD
LOSS
10
TCP
Reno
8
6
4
TCP
Tahoe
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
Philosophy:
Series1
Series2
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
Ch3-2
-33
TCP Sender Congestion Control
(see Table 3.3 for TCP Reno)
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Ch3-2
-34
TCP mechanisms Illustarted
Ch3-2
-35
TCP congestion control FSM: details
duplicate ACK
dupACKcount++
L
cwnd = 1 MSS
ssthresh = 64 KB
dupACKcount = 0
slow
start
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s),as allowed
cwnd > ssthresh
L
timeout
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
ssthresh = cwnd/2
cwnd = 1
dupACKcount = 0
retransmit missing segment
new ACK
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
transmit new segment(s),as allowed
.
congestion
avoidance
duplicate ACK
dupACKcount++
New ACK
cwnd = ssthresh
dupACKcount = 0
dupACKcount == 3
ssthresh= cwnd/2
cwnd = ssthresh + 3
retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
Transport Layer
3-36
Other TCP Variants
 Inefficiency in high speed networks:
 it takes long time for sender to recovery to original cwnd
after it being halved due to a packet loss.
 HSTCP, TCP-Westwood, FAST, Quick-Start, Explicit
Transport Error Notification, eXplicit Control Protocol
(XCP)…
 Inefficiency in wireless networks:
 current TCP implementations are unable to distinguish
buffer overflow loss (congestion in wired networks) and
random loss (in wireless networks)
 Inefficiency in satellite network:
 long propagation delay and large RTT indicate low
throughput.
 TCP variants for wireless/satellite networks:
 TCP-Peach, Indirect-TCP (Split TCP), SNOOP, Explicit
Loss Notification (ELN)..
Ch3-2
-37
TCP throughput
 What’s the average throughout ot TCP as a
function of window size and RTT?

Ignore slow start
 Let W be the window size when loss occurs.
 When window is W, throughput is W/RTT
 Just after loss, window drops to W/2,
throughput to W/2RTT.
 Average throughout (roughly) .75 W/RTT
Ch3-2
-38
TCP Futures
 Example: 1500 byte segments, 100ms RTT, want 10
Gbps throughput
 Requires window size W = 83,333 in-flight
segments even if no loss
 Throughput in terms of loss rate:
 ➜ L = 2·10-10 Wow
 New versions of TCP for high-speed needed!
Ch3-2
-39
High-Speed TCP (HSTCP)
 Like standard TCP when cwnd is small
 More aggressive than standard TCP when cwnd is large
Increase window: cwnd = cwnd + a(cwnd) /cwnd for
every segment ACKed
 Decrease window: cwnd = (1 – b(cwnd) ) cwnd for
every loss
 For standard TCP, a(cwnd)=1 and b(cwnd)=0.5.
 HSTCP: e.g., cwnd=83000, b(83000)=0.1means
decreasing 10% after a congestion event,
a(83000)=72 means an increase of 72 segments
upon receiving one ACK.

Ch3-2
-40
HSTCP
Ch3-2
-41
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K, regardless the initial window
size.
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Ch3-2
-42
Why is TCP fair?
Two competing sessions:
 Additive increase in throughput from (x,y) with slope of 1
 multiplicative decrease in throughput to ({x+i}/2, {y+i}/2)
equal bandwidth share
R
x+i, y+i
(x+i)/2, (y+i)/2
x,y
Connection 1 throughput R
Ch3-2
-43
Fairness (more)
Fairness and UDP
 Multimedia apps often
do not use TCP

do not want rate
throttled by congestion
control
 Instead use UDP:
 pump audio/video at
constant rate, tolerate
packet loss
 Research area: make
themTCP friendly
Fairness and parallel TCP
connections
 Can open more than one
parallel connections
between 2 hosts
(NetAnts).
 Web browsers do this
 Example: link of rate R
supporting 9 connections;


new app asks for 1 TCP, gets
rate R/10
new app asks for 9 TCPs,
gets R/2 !
Ch3-2
-44
Delay modeling
Q: How long does it take to
receive an object from a
Web server after sending
a request?
Ignoring congestion, delay is
influenced by:
 TCP connection establishment
 data transmission delay
 slow start
Notation, assumptions:
 Assume one link between
client and server of rate R
 S: MSS (bits)
 O: object/file size (bits)
 no retransmissions (no loss,
no corruption)
Window size:
 First assume: fixed
congestion window, W
segments
 Then dynamic window,
modeling slow start
Ch3-2
-45
Fixed congestion window (1)
First case:
WS/R > RTT + S/R: ACK for
first segment in window
returns before window’s
worth of data sent
delay = 2RTT + O/R
Ch3-2
-46
Fixed congestion window (2)
Second case:
 WS/R < RTT + S/R: wait for
ACK after sending window’s
worth of data sent
 “gap” between two “rounds”
is S/R+RTT – WS/R
 Let K = O / WS be the
number of rounds
 There are K-1 gaps
delay = 2RTT + O/R
+ (K-1)[S/R + RTT - WS/R]
Ch3-2
-47
TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start
And P is the number of rounds TCP idles at server (i.e, # of gaps):
Will show that the delay for one object is:
P
O
delay   2 RTT   idleTime p
R
p 1
P
O
S
S
  2 RTT   [  RTT  2 p 1 ]
R
R
p 1 R

This is the window
size during
the p-th round in
Slow-Start
O
S
S
 2 RTT  P[ RTT  ]  (2 P  1)
R
R
R
1)Let K be the number of sending rounds/windows for the object.
(calculate K?)
2)If the object data is large , then when the max sending window is large enough,
eventually the idleTime will be 0 (after slow-start ends, as assumed earlier)
3)Let Q be the last round during which idleTime is still > 0
then
Ch3-2
-48
TCP Delay Modeling: Slow Start (2)
Delay components:
• 2 RTT for connection
estab and request
• O/R to transmit
object
• time server idles due
to slow start
initiate TCP
connection
request
object
first window
= S/R
RTT
Server idles:
P = min{K-1,Q} times
second window
= 2S/R
third window
= 4S/R
Example:
• O/S = 15 segments
• K = 4 windows
•Q=2
• P = min{K-1,Q} = 2
Server idles P=2 times
with a decreasing amt
of idleTime
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
Ch3-2
-49
TCP Delay Modeling (3)
2 p 1
S
 time to transmit the pth window
R
initiate TCP
connection
request
object
first window
= S/R

RTT
S
p 1 S 
 R  RTT  2 R   idle time after the pth window
second window
= 2S/R
third window
= 4S/R
fourth window
= 8S/R
complete
transmission
object
delivered
time at
client
time at
server
Ch3-2
-50
TCP Delay Modeling (4)
Recall K = number of windows that cover object
How do we calculate K ?
Calculation of Q, number of idles for infinite-size object,
is similar: largest Q s.t.:
Q 1 S
S/R+ RTT >= 2
R
Ch3-2
-51
Effect of Slow Start
 Delay is affected by
 File/object size O
 Transmission rate R
 Fixed window size or MSS  idle time
 RTT
 Extra delay due to slow start
 With a large O, and small R and RTT

Slow start does not hurt much
 With a small O and large RxRTT
 Slow start hurts significantly (percentage wise)
Ch3-2
-52
Food For Thought
 Assume Web page consists of:
1 base HTML page (of size O bits)
 M images (each of size O bits)
 Non-persistent HTTP:
 M+1 TCP connections in series
 Response time = (M+1)O/R + (M+1)2RTT + sum of idle
times
 What about Persistent HTTP?
 What about Non-persistent HTTP with X parallel
connections? (is this X times fast than having 1 nonpersistent HTTP?)

Ch3-2
-53
HTTP Response time (in seconds)
RTT = 100 msec, O = 5 Kbytes, M=10 and X=5
20
18
16
14
12
10
8
6
4
2
0
non-persistent
persistent
parallel nonpersistent
28
100
1
10
Kbps Kbps Mbps Mbps
For low bandwidth, connection & response time dominated by
transmission time.
Persistent connections only give minor improvement over parallel
connections.
Ch3-2
-54
HTTP Response time (in seconds)
RTT =1 sec, O = 5 Kbytes, M=10 and X=5
70
60
50
non-persistent
40
persistent
30
20
parallel nonpersistent
10
0
28
100
1
10
Kbps Kbps Mbps Mbps
For larger RTT, response time dominated by TCP establishment
& slow start delays. Persistent connections now give important
improvement: particularly in high delaybandwidth networks.
Ch3-2
-55
Chapter 3: Summary
 principles behind transport
layer services:
 reliable data transfer
 flow control
 congestion control
 instantiation and
implementation in the
Internet
 UDP
 TCP
Next:
 leaving the network
“edge” (application,
transport layers)
 into the network
“core”
Ch3-2
-56