Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
FAST TCP
Steven Low
CS/EE
netlab.CALTECH.edu
Oct 2003
FAST Protocols for Ultrascale Networks
Internet: distributed feedback control system
TCP: adapts sending rate to congestion
AQM: feeds back congestion information
AQM
wi
tan -1i (t ) 1
T (t ) 2
l
p
1
( yl (t ) cl )
cl
Faculty
Doyle (CDS,EE,BE)
Low (CS,EE)
Newman (Physics)
Paganini (UCLA)
Staff/Postdoc
Bunn (CACR)
Jin (CS)
Ravot (Physics)
Singh (CACR)
StarLight
p
Rb’(s)
xi
CERN
y
TCP
q
research & production
networks
Chicago
Rf (s)
x
WAN in Lab
Caltech
Calren2/Abilene
Geneva
xi ( t ) qi ( t )
i di
i (t )qi (t )
Multi-Gbps
50-200ms delay
Theory
Experiment
People
Implementation
Students
Choe (Postech/CIT)
Hu (Williams)
J. Wang (CDS)
Z.Wang (UCLA)
Wei (CS)
155Mb/s
SURFNet
Amsterdam
equilibrium
10Gb/s
slow
start
FAST
retransmit
time
out
FAST
recovery
Industry
Doraiswami (Cisco)
Yip (Cisco)
Partners
CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
netlab.caltech.edu/FAST
Outline
Motivation
Network model
FAST TCP
Equilibrium
Stability
Experiments
TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
High Energy Physics
Large global collaborations
2000 physicists from 150 institutions in >30 countries
300-400 physicists in US from >30 universities & labs
SLAC has 500TB data by 4/2002, world’s largest database
Typical file transfer ~1 TB
At 622Mbps: ~ 4 hrs
At 2.5Gbps: ~ 1 hr
At 10Gbps: ~15min
Gigantic elephants!
LHC (Large Hadron Collider) at CERN, to open 2007
Generate data at PB (1015B)/sec
Filtered in realtime by a factor of 106 to 107
Data stored at CERN at 100MB/sec
Many PB of data per year
To rise to Exabytes (1018B) in a decade
netlab.caltech.edu
HEP high speed network
… that must change
netlab.caltech.edu
HEP Network (DataTAG)
NewYork
ABILEN
E
UK
SuperJANET4
It
GARR-B
STARLIGHT
ESNET
GENEVA
GEANT
NL
SURFnet
STAR-TAP
CALRE
N
Fr
Renater
2.5 Gbps Wavelength Triangle 2002
10 Gbps Triangle in 2003
netlab.caltech.edu
Newman (Caltech)
Performance at large windows
DataTAG Network:
CERN (Geneva) –
StarLight (Chicago) –
SLAC/Level3 (Sunnyvale)
ns-2 simulation
average
utilization
95%
1G
27%
19%
10Gbps
capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps,
10Gbps; 100 ms round trip latency; 100 flows
J. Wang
(Caltech, June 02)
netlab.caltech.edu
txq=100
txq=10000
Linux TCP
Linux TCP
txq=100
FAST
capacity = 1Gbps; 180 ms round trip latency;
1 flow
C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02)
Outline
Motivation
Network model
FAST TCP
Equilibrium
Stability
Experiments
TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Congestion Control
RTT
Source
1 2
W
W
1 2
W
1 2
~ W packets per RTT
Lost packet detected by missing ACK
Congestion signal: delay and loss
netlab.caltech.edu
time
ACKs
data
Destination
1 2
W
time
Congestion control
pl(t)
xi(t)
Example congestion measure pl(t)
Loss (Reno)
Queueing delay (Vegas)
netlab.caltech.edu
TCP/AQM
pl(t)
TCP:
Reno
Vegas
xi(t)
AQM:
DropTail
RED
REM/PI
AVQ
Congestion control is a distributed asynchronous algorithm
to share bandwidth
It has two components
TCP: adapts sending rate (window) to congestion
AQM: adjusts & feeds back congestion information
They form a distributed feedback control system
Equilibrium & stability depends on both TCP and AQM
And on delay, capacity, routing, #connections
netlab.caltech.edu
Network model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb
R
f li
e
Rb li e
netlab.caltech.edu
AQM
s li
s li
’(s)
p
if source i uses link l
if source i uses link l
Vegas model
for every RTT
if W/RTTmin – W/RTT < then W ++
{
if W/RTTmin – W/RTT > then W --
}
queue size
Fi:
Gl:
1
xi 2
Ti (t )
if
xi (t )qi (t ) i di
1
xi 2
Ti (t )
if
xi (t )qi (t ) i di
xi 0
else
p l c1l ( yl (t ) cl )
netlab.caltech.edu
E2E queueing delay
Link queueing delay
Vegas model
x
Rf(s)
F1
Network
TCP
y
G1
FN
GL
q
Rb
1
Fi
sgn 1
2
T (t )
netlab.caltech.edu
AQM
xi ( t ) qi ( t )
i di
’(s)
p
yl (t )
Gl
1
cl
Outline
Motivation
Network model
FAST TCP
Equilibrium
Stability
Experiments
TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t 1) F ( p (t ), x(t ))
p (t 1) G ( p(t ), x(t ))
Equilibrium
Performance
Throughput, loss, delay
Fairness
Utility
netlab.caltech.edu
Dynamics
Local stability
Cost of stabilization
Model
Network
Links l of capacities cl
Sources s
L(s) - links used by source s
Us(xs) - utility if source rate = xs
x1
x1 x3 c2
x1 x2 c1
c1
c2
x2
netlab.caltech.edu
x3
Summary: duality model
Flow control problem (Kelly, Malloo, Tan 98)
U ( x )
max
s
xs 0
s
s
subject to
Rx c
Primal-dual algorithm
x(t 1) F ( RT p(t ), x(t ))
p(t 1) G ( p(t ), Rx (t ))
Reno, Vegas
DropTail, RED, REM
TCP/AQM
Maximize utility with different utility functions
Result
(L 00):
(x*,p*) primal-dual optimal iff
yl* cl with equality if
netlab.caltech.edu
pl* 0
Example utility functions
Reno - 1 :
3/ 2
tan 1
Ti
2 / 3 xiTi
Reno - 2 :
xiTi
1
log
Ti
2 xiTi 3
Vegas
i log xi
:
General :
netlab.caltech.edu
(1 ) 1 xi1
log xi
1
1
Game interpretation
Source s:
max U s ( xs ) xs Rls pl
xs 0
xs (t 1) U
Link l:
l
' 1
s
Rls pl (t )
s
max pl Rls xs cl
pl 0
s
pl (t 1) pl (t ) l xs (t ) cl
s
netlab.caltech.edu
Synchronous convergence
Theorem (L & Lapsley 99)
Provided R has full row rank & Us strictly concave:
Gradient projection algorithm of dual problem
Converges to optimal primal-dual solutions if
2
l
SL
Limit point: unique Pareto optimal Nash
equilibrium
netlab.caltech.edu
Asynchronous convergence
Sources and links update & compute
at different times
with different frequencies
using delayed info
Theorem (L & Lapsley 99)
Converges in asynchronous environment with
smaller
netlab.caltech.edu
Equilibrium of Vegas
Network
Link queueing delays: pl
Queue length:
clpl
Sources
Throughput:
xi
E2E queueing delay :
qi
Packets buffered:
xi qi i d i
Ui(x) = i di log x
Utility funtion:
Proportional fairness
netlab.caltech.edu
Validation
(L. Wang, Princeton)
Source rates (pkts/ms)
# src1
src2
1 5.98 (6)
2 2.05 (2)
3.92 (4)
3 0.96 (0.94) 1.46 (1.49)
4 0.51 (0.50) 0.72 (0.73)
5 0.29 (0.29) 0.40 (0.40)
#
1
2
3
4
5
queue (pkts)
19.8 (20)
59.0 (60)
127.3 (127)
237.5 (238)
416.3 (416)
netlab.caltech.edu
src3
src4
3.54 (3.57)
1.34 (1.35)
0.68 (0.67)
3.38 (3.39)
1.30 (1.30)
baseRTT (ms)
10.18 (10.18)
13.36 (13.51)
20.17 (20.28)
31.50 (31.50)
49.86 (49.80)
src5
3.28 (3.34)
Methodology
Protocol
(Reno, Vegas, RED, REM/PI…)
x(t 1) F ( p (t ), x(t ))
p (t 1) G ( p(t ), x(t ))
Equilibrium
Performance
Throughput, loss, delay
Fairness
Utility
netlab.caltech.edu
Dynamics
Local stability
Cost of stabilization
Stability: Reno/RED
x
TCP
Rf(s)
F1
Network
FN
q
TCP:
Small
Small c
Large N
RED:
Small
Large delay
netlab.caltech.edu
y
G1
AQM
GL
Rb
p
’(s)
Theorem (Low et al, Infocom’02)
Reno/RED is locally stable if
c 3 3
2
N
3
(c N )
( 1- ) 2
4 2 2 (1 ) 2
Stability: scalable control
x
TCP
Rf(s)
F1
Network
FN
q
xi (t ) xi e
y
G1
AQM
GL
Rb
p
’(s)
i
q (t )
i mi i
p l (t )
1
yl (t ) cl
cl
Theorem (Paganini, Doyle, L, CDC’01)
Provided R is full rank, feedback loop is locally stable
for arbitrary delay, capacity, load and topology
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
Rf(s)
F1
Network
FN
q
y
G1
AQM
GL
Rb
1
xi ( t ) qi ( t )
-1
xi
tan
(
t
)
1
i (t )qi (t )
i di
2
T (t )
p
’(s)
p l (t )
1
yl (t ) cl
cl
Theorem (Choe & L, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi (a, )
netlab.caltech.edu
Stability: Stabilized Vegas
x
TCP
-1
Rf(s)
F1
Network
FN
q
1
xi
sgn 1
2
T (t )
y
G1
AQM
GL
Rb
xi ( t ) qi ( t )
i di
p
’(s)
p l (t )
1
yl (t ) cl
cl
Theorem (Choe & L, Infocom’03)
Provided R is full rank, feedback loop is locally stable if
max xiTi (a, )
netlab.caltech.edu
Stability: FAST
x
TCP
Rf(s)
F1
Network
FN
q
y
G1
AQM
GL
Rb
1
xi ( t ) qi ( t )
-1
xi
tan
(
t
)
1
i (t )qi (t )
i di
2
T (t )
p
’(s)
p l (t )
1
yl (t ) cl
cl
Application
Stabilized TCP with current routers
Queueing delay as congestion measure has right scaling
Incremental deployment with ECN
netlab.caltech.edu
Outline
Motivation
Network model
FAST TCP
Equilibrium
Stability
Experiments
TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Window control algorithm
Theorem (Jin, Wei, L ‘03)
In absence of delay
Mapping from w(t) to w(t+1) is contraction
Global exponential convergence
Full utilization after finite time
Utility function: i log xi (proportional fairness)
netlab.caltech.edu
Network
(Sylvain Ravot, caltech/CERN)
netlab.caltech.edu
FAST BMPS
10
9
7
FAST
2
1
Internet2
Land Speed
Record
netlab.caltech.edu
1
2
FAST
Standard MTU
Throughput averaged over > 1hr
#flows
Aggregate throughput
88%
FAST
Standard MTU
Utilization averaged over > 1hr
90%
90%
Average
utilization
92%
95%
1hr
1 flow
netlab.caltech.edu
1hr
2 flows
6hr
7 flows
1.1hr
6hr
9 flows
10 flows
Aggregate throughput
92%
FAST
Standard MTU
Utilization averaged over 1hr
2G
48%
Average
utilization
95%
1G
27%
16%
19%
txq=100
txq=10000
Linux TCP
Linux TCP
netlab.caltech.edu
FAST
Linux TCP
Linux TCP
FAST
SCinet
Caltech-SLAC experiments
Acknowledgments
SC2002
Baltimore, Nov 2002
netlab.caltech.edu/FAST
Prototype
C. Jin, D. Wei
Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang
(UCLA)
Experiment/facilities
Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S.
Ravot (Caltech/CERN), S. Singh
CERN: O. Martin, P. Moroni
Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip
DataTAG: E. Martelli, J. P. Martin-Flatin
Internet2: G. Almes, S. Corbato
Level(3): P. Fernes, R. Struble
SCinet: G. Goddard, J. Patton
SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J.
Navratil, J. Williams
StarLight: T. deFanti, L. Winkler
Major sponsors
ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
FAST
Dynamic sharing: 3 flows
Dynamic sharing on Dummynet
netlab.caltech.edu
capacity = 800Mbps
delay=120ms
3 flows
iperf throughput
Linux 2.4.x (HSTCP: UCL)
Linux
FAST
Dynamic sharing: 3 flows
Linux
Steady throughput
HSTCP
netlab.caltech.edu
STCP
queue
FAST
loss
Linux
throughput
30min
Dynamic sharing on Dummynet
capacity = 800Mbps
HSTCP
delay=120ms
14 flows
iperf throughput
Linux 2.4.x (HSTCP: UCL)
netlab.caltech.edu
STCP
queue
Room for mice !
FAST
loss
Linux
throughput
HSTCP
HSTCP
netlab.caltech.edu
30min
STCP
Outline
Motivation
Network model
FAST TCP
Equilibrium
Stability
Experiments
TCP/IP
netlab.caltech.edu
WWW, Email, Napster, FTP, …
Applications
TCP/AQM
IP
Transmission
Ethernet, ATM, POS, WDM, …
Network model
x
y
R
F1
Network
TCP
G1
FN
q
AQM
GL
R
T
p
Rli 1 if source i uses link l
IP routing
x(t 1) F ( RT p(t ), x(t ))
p(t 1) G ( p(t ), Rx (t ))
Reno, Vegas
netlab.caltech.edu
DT, RED, …
Motivation
Primal : max max
R
x 0
Dual :
netlab.caltech.edu
min
p 0
U ( x )
i
i
subject to Rx c
i
U i ( xi ) xi max Rli pl pl cl
i max
Ri
xi 0
l
l
Motivation
Primal : max max
R
x 0
Dual :
min
p 0
U ( x )
i
i
subject to Rx c
i
U i ( xi ) xi max Rli pl pl cl
i max
Ri
xi 0
l
l
Shortest path routing!
Can TCP/IP maximize utility?
netlab.caltech.edu
TCP-AQM/IP
Theorem (Wang, et al 03)
Primal problem is NP-hard
Proof
Reduce integer partition to primal problem
Given: integers {c1, …, cn}
Find: set A s.t.
c c
iA
netlab.caltech.edu
i
iA
i
TCP-AQM/IP
Theorem (Wang, et al 03)
Primal problem is NP-hard
Achievable utility of TCP/IP?
Stability?
Duality gap?
Conclusion: Inevitable tradeoff between
achievable utility
routing stability
netlab.caltech.edu
Ring network
destination
r
TCP/AQM
IP
netlab.caltech.edu
Single destination
Instant convergence of TCP/IP
Shortest path routing
Link cost = pl(t) + dl
price
routing
pl(0)
pl(1)
r(0)
r(1)
…
static
r(t), r(t+1) ,
…
Ring network
destination
Stability: r ?
Utility: V ?
r* : optimal routing
V* : max utility
r
TCP/AQM
IP
netlab.caltech.edu
pl(0)
pl(1)
r(0)
r(1)
…
r(t), r(t+1) ,
…
Ring network
destination
Stability: r ?
Utility: V ?
link cost = pl(t) + dl
r
netlab.caltech.edu
Theorem (Infocom 2003)
“No” duality gap
Unstable if = 0
starting from any r(0), subsequent
r(t) oscillates between 0 and 1
Ring network
destination
Stability: r ?
Utility: V ?
link cost = pl(t) + dl
r
Theorem (Infocom 2003)
Solve primal problem asymptotically
as
| r * r | 0
V * V 0
netlab.caltech.edu
Ring network
destination
Stability: r ?
Utility: V ?
link cost = pl(t) + dl
r
netlab.caltech.edu
Theorem (Infocom 2003)
large: globally unstable
small: globally stable
medium: depends on r(0)
General network
Conclusion: Inevitable tradeoff between
achievable utility
routing stability
random graph
20 nodes, 200 links
netlab.caltech.edu
Achievable utility
netlab.caltech.edu/FAST
FAST TCP: motivation, architecture,
algorithms, performance.
submitted for publication, July 1, 2003
-release: August 2003
Inquiry: [email protected]
FAST Project Review
Caltech, Oct 27-28, 2003
netlab.caltech.edu