Download Broadband Protocols WP 1.2.1 IP protocols, Lambda switching

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Broadband Protocols
WP 1.2.1
IP protocols, Lambda switching, multicasting
Richard Hughes-Jones
The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
1
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Protocols Document
2
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Protocols Document 1
 “Protocol Investigation for eVLBI Data Transfer”
 Document JRA-WP1.2.1.001
 Jodrell & Manchester folks with hard work from Matt
 Completed and on the EXPReS WIKI
 Introduces e-VLBI and its Networking Requirements




Continuously streamed data
Individual packets are not particularly valuable.
Maintenance of the data rate is important
Quite different to those where bit-wise correct transmission is required
e.g. file transfer
 Forms a valuable use case for GGF GHPN-RG
 Presents the actions required in order to make an informed
decision and to implement suitable protocols in the European
VLBI Network. Strategy document.
3
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Protocols Document 2
Protocols considered for investigation include:






TCP/IP
UDP/IP
DCCP/IP
VSI-E RTP/UDP/IP
Remote Direct Memory Access
TCP Offload Engines
Very useful discussions at Haystack VLBI meeting
 Agreement to make joint tests Haystack-Jodrell
 Use of ESLEA 1 Gbit transatlantic link
Work in progress – Links to ESLEA UK e-science
 Vlbi-udp – Simon: UDP/IP stability & the effect of packet loss on
correlations
 Tcpdelay – Stephen: TCP/IP and CBR data
4
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
tcpdelay
5
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
tcpdelay: VLBI Application Protocol
 Want to examine how TCP moves Constant Bit Rate Data
 tcpdelay a test program:
 instrumented TCP program emulates sending CBR
Data.
 Records relative 1-way delay
 Record TCP Stack activity with web100
Number of packets
n bytes

Wait time
time
6
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
VLBI Application Protocol
 VLBI data is produced at Constant Bit Rate
Sender
TCP & Network
Receiver
Timestamp1
Timestamp2
Data1
Timestamp3
Data2
Timestamp4
Packet loss
Timestamp5
Data3
Data4
●●●
Time
7
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Visualising the Results
Stephen Kershaw
 When packet loss is detected TCP:
 Reduces Cwnd
 Halves the sending rate
 Expect a delay in the message arrival time
Arrival time
Packet loss
Delay in
stream
Expected arrival
time at CBR
Message number / Time
8
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Arrival Times: UKLight JB-JIVE-Manc
Effect of loss rate on message arrival time




 TCP buffer 32M bytes
50
Drop 1 in 5k
45
Drop 1 in 10k
40
Drop 1 in 20k
Drop 1 in 40k
No loss
35
Time / s
Message size: 1448 Bytes
Wait time: 22 us
Data Rate: 525 Mbit/s
Route:
JB-UKLight-JIVEUKLight-Man
 RTT ~27 ms
30
25
20
 BDP @512Mbit 1.8Mbyte
 Estimate catchup possible
if loss < 1 in 1.24M
15
10
5
0
1
2
3
4
5
6
Message number
7
8
9
4
x 10
9
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
10
TCP Web100: JB-Manc – Large buffer
Standard TCP
TCP buffer 930k
Drop 1 in 40,000 packets
Classic Cwnd behaviour
600000
500000
400000
400000
300000
300000
200000
200000
100000
100000
0
5000
6000
7000
8000
9000
0
10000 11000 12000 13000 14000 15000
time ms
350
300
250
200
150
100
50
0
5000
7000
9000
11000
13000
15000
time ms
3
 Limited by ssthresh !
 TCP requires much care!!
pkt re-transmit
2.5
2
1.5
1
0.5
0
5000
7000
9000
time ms
11000
13000
15000
10
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Cwnd
500000
Num. Dup ACKs
Message size: 1448 Bytes
Wait time: 22
Data Rate: 525 Mbit/s
Route:
JB-UKLight-JIVEUKLight-Man
 RTT ~27 ms
Data Bytes Out








DataBytesOut (Delta)
DataBytesIn (Delta)
CurCwnd (Value)
600000
iBOB
11
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Prototype iBOB with two sampler boards attached
FPGA based signal processing board from UC Berkeley
12
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
iBOB block diagram
Bryan Anderson
RAM
10GE CX4
Station
VSI
Board
iBOB
 10 Gigabit Ethernet now available
 UDP/IP module exists
 Use for Demonstration of FPGA driven
IP networking
 Link to PC NIC – diagnostics
 Test over GÉANT
 Onsala - Jodrell
CX4 - fibre
media
converter
10GE
VSI or headstack
Disk
based
system
13
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Multi-Gigabit Trials on GEANT
Collaboration with Dante.
What is inside GÉANT2
 What is the collaboration interesting?
 10 Gigabit Ethernet
 UDP memory-2-memory flows
 TCP flows with allocated Bandwidth
Options using GÉANT Development Network
 10 Gbit SDH Network
Options Using the GÉANT LightPath Service
 PoP Location for Network tests
14
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
GÉANT2 Topology
15
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
GÉANT2: The Convergence Solution
EXPReS PC
10 GE
NREN Access
1678 MCC
GÉANT2
POP A
Existing
IP Router
L2
Matrix
TDM
Matrix
GÉANT2
POP B
1678 MCC
1626 LM
L2
TDM
1626 LM
Managed
Lambda’s
EXPReS PC
10 GE
Existing
IP Router
16
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
From PoS to Ethernet
Connect. Communicate. Collaborate
• More Economical Architecture
• Highest Overall Network Availability
• Flexibility (VLAN management)
• Highest Network Performance (Latency)
IP Links
VLANs
Router
1678 MCC
Transport Node
VC-4-nv
Channels
1/10 Gigabit Ethernet
L2
Matrix
TDM
Matrix
17
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
What do we want to do?
 Set up 4 Gigabit Lightpath Between GÉANT PoPs
 Collaboration with Dante
 PCs in their PoPs with 10 Gigabit NICs
 VLBI Tests:
 UDP Performance
 Throughput, jitter, packet loss, 1-way delay, stability
 Continuous (days) Data Flows – VLBI_UDP and
 multi-Gigabit TCP performance with current kernels
 Experience for FPGA Ethernet packet systems
 Dante Interests:
 multi-Gigabit TCP performance
 The effect of (Alcatel) buffer size on bursty TCP when using BW limited
Lightpaths
 Need A Collaboration Agreement
18
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Options Using the GÉANT Development Network






10 Gigabit SDH backbone
Alkatel 1678 MCC
Node location:
 London
 Amsterdam
 Paris
 Prague
 Frankfurt
Can do traffic routing
so make long rtt paths
Available Dec/Jan 07
Less Pressure for
long term tests
19
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Options Using the GÉANT LightPaths
 Set up 4 Gigabit Lightpath Between GÉANT PoPs
 Collaboration with Dante
 PCs in Dante PoPs



10 Gigabit SDH backbone
Alkatel 1678 MCC
Node location:
 Budapest
 Geneva
 Frankfurt
 Milan
 Paris
 Poznan
 Prague
 Vienna

Can do traffic routing
so make long rtt paths
Ideal: London Copenhagen

20
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
4 Gigabit GÉANT LightPath
 Example of a 4 Gigabit Lightpath Between GÉANT PoPs
 PCs in Dante PoPs
 26 * VC-4s 4180 Mbit/s
21
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
PCs and Current Tests
22
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Test PCs Have Arrived
 Boston/Supermicro X7DBE
 Two Dual Core Intel Xeon Woodcrest 5130
 2 GHz
 Independent 1.33GHz FSBuses
 530 MHz FD Memory (serial)
 Chipsets: Intel
5000P MCH – PCIe & Memory
ESB2 – PCI-X GE etc.
 PCI
 3 8 lane PCIe buses
 3* 133 MHz PCI-X
 2 Gigabit Ethernet
 SATA
23
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Lab Tests 10 Gigabit Ethernet
 10 Gigabit Test Lab being set up in Manchester
 Cisco 7600
 Cross Campus λ <1ms
 Server quality PCs
 Neterion NICs
 Myricom & Chelsio being purchased
 B2B performance so far
 SuperMicro X6DHE-G2
 Kernel (2.6.13) & Driver dependent!
 One iperf TCP data stream 4 Gbit/s
 Two bi-directional iperf TCP data streams 3.8 & 2.2 Gbit/s
 UDP Disappointing
 Propose to install Fedora Core5 Kernel 2.6.17 on the new Intel dual-core
PCs
24
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Any Questions?
25
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Backup Slides
26
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Bandwidth on Demand
Our Long-Term Vision
Bandwidth
Request
Policy Middleware
Network Resource Mgr
Bandwidth
Request
Applications
e.g. GRID
Research Activity
UNI-C
Command
1678 MCC
Ethernet
1678 MCC
GMPLS
Ethernet
1678 MCC
Applications
e.g. GRID
27
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
10 Gigabit Ethernet: UDP Throughput
1500 byte MTU gives ~ 2 Gbit/s
Used 16144 byte MTU max user length 16080
DataTAG Supermicro PCs
Dual 2.2 GHz Xenon CPU FSB 400 MHz
PCI-X mmrbc 512 bytes
wire rate throughput of 2.9 Gbit/s




CERN OpenLab HP Itanium PCs
Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz
PCI-X mmrbc 4096 bytes
wire rate of 5.7 Gbit/s




SLAC Dell PCs giving a
Dual 3.0 GHz Xenon CPU FSB 533 MHz
PCI-X mmrbc 4096 bytes
wire rate of 5.4 Gbit/s
6000
16080 bytes
14000 bytes
12000 bytes
10000 bytes
9000 bytes
8000 bytes
7000 bytes
6000 bytes
5000 bytes
4000 bytes
3000 bytes
2000 bytes
1472 bytes
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
5000
Recv Wire rate Mbits/s






4000
3000
2000
1000
0
0
5
10
15
20
25
Spacing between frames us
30
35
40
28
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
10 Gigabit Ethernet: Tuning PCI-X
16080 byte packets every 200 µs
Intel PRO/10GbE LR Adapter
PCI-X bus occupancy vs mmrbc


mmrbc
512 bytes
Measured times
Times based on PCI-X times from
the logic analyser
Expected throughput ~7 Gbit/s
Measured 5.7 Gbit/s


10
mmrbc
1024 bytes
PCI-X Transfer time
us
DataTAG Xeon 2.2 GHz
8
6
4
measured Rate Gbit/s
rate from expected time Gbit/s
Max throughput PCI-X
2
0
0
1000
2000
3000
4000
Max Memory Read Byte Count
5000
CSR Access
mmrbc
2048 bytes
PCI-X Sequence
Data Transfer
Interrupt & CSR Update
10
Kernel 2.6.1#17 HP Itanium Intel10GE Feb04
PCI-X Transfer time
us



8
6
4
2
measured Rate Gbit/s
rate from expected time Gbit/s
Max throughput PCI-X
mmrbc
4096 bytes
5.7Gbit/s
0
0
1000
2000
3000
4000
Max Memory Read Byte Count
5000
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
29
Bandwidth Challenge wins Hat Trick
 The maximum aggregate bandwidth was >151 Gbits/s
 130 DVD movies in a minute
 serve 10,000 MPEG2 HDTV movies
in real-time
 22 10Gigabit Ethernet waves
Caltech & SLAC/FERMI booths
 In 2 hours transferred 95.37 TByte
 24 hours moved ~ 475 TBytes
 Showed real-time particle
event analysis
 SLAC Fermi UK Booth:
 1 10 Gbit Ethernet to UK NLR&UKLight:
 transatlantic HEP disk to disk
 VLBI streaming
 2 10 Gbit Links to SALC:
 rootd low-latency file access
application for clusters
 Fibre Channel StorCloud
 4 10 Gbit links to Fermi
 Dcache data transfers
SC2004 101 Gbit/s
FNAL-UltraLight
SLAC-ESnet-USN
In to booth
UKLight
FermiLab-HOPI
SLAC-ESnet
Out of booth
30
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
SC|05 Seattle-SLAC 10 Gigabit Ethernet
 2 Lightpaths:
 Routed over ESnet
 Layer 2 over Ultra Science Net
 6 Sun V20Z systems per λ
 dcache remote disk data access
 100 processes per node
 Node sends or receives
 One data stream 20-30 Mbit/s
 Used Neteion NICs & Chelsio TOE
 Data also sent to StorCloud
using fibre channel links
 Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk
31
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
10 Gigabit Ethernet: TCP Data transfer on PCI-X
 Sun V20z 1.8GHz to
2.6 GHz Dual Opterons
 Connect via 6509
 XFrame II NIC
 PCI-X mmrbc 4096 bytes
66 MHz
Data Transfer
 Two 9000 byte packets b2b
 Ave Rate 2.87 Gbit/s
CSR Access
 Burst of packets length
646.8 us
 Gap between bursts 343 us
 2 Interrupts / burst
32
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
10 Gigabit Ethernet: UDP Data transfer on PCI-X
 Sun V20z 1.8GHz to
2.6 GHz Dual Opterons
 Connect via 6509
 XFrame II NIC
 PCI-X mmrbc 2048 bytes
66 MHz
 One 8000 byte packets
 2.8us for CSRs
 24.2 us data transfer
effective rate 2.6 Gbit/s
Data Transfer
CSR Access 2.8us
 2000 byte packet, wait 0us
 ~200ms pauses
 8000 byte packet, wait 0us
 ~15ms between data blocks
33
FABRIC Meeting, Poznan Poland, 25 Sep 2006, R. Hughes-Jones Manchester
Related documents