Download Architecture-1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
L1&HLT
Trigger Network
Research and Development
Umberto Marconi
INFN Bologna
1
Overview
Introduction
 The network trigger architectures

 Architecture-1 based on Network
Processors
 Architecture-2 based on event packing at
the L1 front-end boards and IP
R&D activity plans
 Requests for funding

U. Marconi
INFN, Bologna
2
History

The LHCb DAQ system designed for 40KHz
trigger rate and 100KB event size
 Physically and logically separated L1 trigger
system running at 1.1MHz (L0 trigger accept rate)
 Gigabit Ethernet as link technology throughout
the system
 Frame handling and event building rely on
Network Processors
 CPU farm for trigger processing

Re-optimisation of the detector…
LHCb Online System TDR, CERN/LHCC 2001-040, December 2001
U. Marconi
INFN, Bologna
3
Redesign of LHCb
 New tracking
System
 More
elaborate
trigger system
U. Marconi
INFN, Bologna
4
LHCb Trigger
U. Marconi
INFN, Bologna
5
TDR DAQ
LHCb Detector
VELO
Level 1
Trigger
Variable latency
<2 ms
ECAL
HCAL
MUON
RICH
40 MHz
40 TB/s
1 MHz
40 kHz
Level-0
Timing L0
& Fast
Control
L1
(TFC)
Front-End Electronics
1 TB/s
Level-1
LAN
Level 0
Trigger
Fixed latency
4.0  s
TRACK
Data
Rates
1 MHz
FEM
Front-End Multiplexers (FEM)
4 GB/s
Throttle
Front-End Links
RU
RU
L1 buffer
2ms max latency
Farm
RU
Read-out Network (RN)
SFC
SFC
Sub-Farm Controllers (SFC)
CPU
CPU
CPU
CPU
Control &
Monitoring
(ECS)
Trigger Level 2 & 3
Event Filter
Variable latency
L2 ~10 ms
L3 ~200 ms
U. Marconi
INFN, Bologna
Readout
Units
Read-out Units (RU)
4 GB/s
Readout
Network
40 MB/s
Storage
6
New Requirements for
L1&HLT

L1 aggregate data traffic @ 1.1.MHz
 Event Size: 8.8 KB(VELO+TT+L0DU)  16KB (+IT+OT)
80 Gb/s  136 Gb/s

HLT aggregate data traffic @ 40 KHz
 Event Size: 30 KB
9.6 Gb/s


L1
L1 max processing latency ~50ms
The amount of data to be moved for L1
dominates over the HLT
U. Marconi
INFN, Bologna
7
Architectures
Architecture-1
Level-1
Traffic
125-239
Links
1.1 MHz
8.8-16.9 GB/s
FE
FE
FE
FE
FE
FE
FE
FE
FE
Front-end Electronics
FE FE FE TRM
Switch
Switch
77-135 NPs
NP
NP
77-135 Links
6.4-13.6 GB/s
Storage
System
Level-1 Traffic
NP
Readout Network
NP
NP
SFC
SFC
SFC
NP
50-100 Links
5.5-10 GB/s
SFC
50-100
SFCs
HLT Traffic
Multiplexing
Layer
L1-Decision
24 NPs
62-83 Switches
FE
FE
FE
FE
FE
Switch
Switch
349
Links
40 kHz
2.3 GB/s
Switch
Switch
Switch
L1-Decision
Readout Network
SFC
SFC
SFC
90-153
SFCs
Switch Switch Switch
Level-1 Traffic
33 Links
1.7 GB/s
Sorter
TFC
System
90-153 Links
5.5-10 GB/s
Storage
System
Gb Ethernet
31 Switches
Multiplexing
Layer
Sorter
SFC
FE
64-157
Links
88 kHz
Switch
SFC
FE
24 Links
1.5 GB/s
Event
Builder
NP
FE
HLT
Traffic
Front-end Electronics
FE FE FE TRM
126-240
Links
44 kHz
5.5-11.0 GB/s
HLT Traffic
Mixed Traffic
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
SFC
SFC
Switch
Switch
CPU
~1400 CPUs
CPU
CPU
CPU
CPU
CPU
CPU
Farm
Mixed Traffic
Farm CPUs
~1200 CPUs
Network Processor
IBM NP4GS3
4 full-duplex
Gigabit Ethernet ports
U. Marconi
INFN, Bologna
FE
349
Links
40 kHz
2.3 GB/s
TFC
System
37-70 NPs
Switch
HLT
Traffic
30 Switches
73-140 Links
7.9-15.1 GB/s
Switch
Gb Ethernet
Architecture-2
Level-1
Traffic
No Network Processors
IP
Event Building at the SFC
Gigabit Ethernet
UTP Version
1000Base-T Cat5e
unshielded twisted pairs
8
Ethernet Constraint
U. Marconi
INFN, Bologna
9
N:M Multiplexing



N input, M output links
Reduce output rate by factor 1/M
Increase the max aggregated
payload (@70% load):
 M=2: 149 B
 M=3: 236 B
 M=4: 324 B
Network
Processor
U. Marconi
INFN, Bologna
10
Protocols

Pure push-through protocol
 no horizontal synchronization
 Vertical synchronization through arrival of data
 Buffer overflow protection via throttle
• Central Buffer Monitoring in TFC (Readout Supervisor)
– Zero Suppression in FE guaranteed within 20s
 Level-1 Buffer can be centrally monitored
• NPs will drive throttle signals (L0 throttle or L1 throttle)
• SFCs throttle through ECS (latency ~ms)

Level-1 Latency control through hierarchically graded




U. Marconi
INFN, Bologna
Timeout on CPUs
Timeout on SFC
Timeout on Decision Sorter
Timeout on Readout Supervisor
11
Real Time Operation

Linux 2.5 kernel has real time
capabilities (preemptive kernel):
 Realtime priorities: the L1 task will never
be interrupted until it finishes
 The context switch latency on nowadays
CPUs is low: 10.1 ± 0.2 µs

The scheme of running both tasks
concurrently is sound
U. Marconi
INFN, Bologna
12
Bit Error Rate (BER)




LHCb is based 1000 BaseT, because of cost
reasons
Gigabit Ethernet is specified to work over
UTP CAT5e cables (1000 BaseT)
The BER is defined to be < 10-11  one bad
packet per 100 s, however real equipment is
expected to be much better.
BER depends not only on the cable, but
particularly also on the end-points
U. Marconi
INFN, Bologna
13
Architecture-2 vs.
Architecture-1

What is the gain getting rid of the NP ?
 Concern* about using a NP since the line is going to be stopped by
IBM
•
All of the NP would ever be needed (incl. spares, upgrades, etc.) would
have to be bought very soon (large investment)
 No need to design and build the NP-based modules
 Fully commercial (commodity) system (Switches, CPUs)


•
Large switches/routers will not be a problem in 2005
•
System can grow by adding switch ports and SFCs
Scaling behavior easier
Features have to be moved into the L1 Front-End Electronics
 Reduce protocol overheads and fragment rate by event packing
 Protocol adaptations (FE speaks IP)
 Destination assignment
•
FE or Readout Supervisor together with FE
(*)L1&HLT review CERN 29/4/03
U. Marconi
INFN, Bologna
14
Architecture-1
Level-1
Traffic
125-239
Links
1.1 MHz
8.8-16.9 GB/s
FE
FE
FE
FE
FE
FE
FE
FE
FE
Front-end Electronics
FE FE FE TRM
Switch
Switch
77-135 NPs
NP
NP
77-135 Links
6.4-13.6 GB/s
Storage
System
Readout Network
NP
NP
Level-1 Traffic
Multiplexing
Layer
L1-Decision
24 NPs
24 Links
1.5 GB/s
SFC
SFC
SFC
Frame
Handling
Sorter
TFC
System
37-70 NPs
Switch
349
Links
40 kHz
2.3 GB/s
30 Switches
73-140 Links
7.9-15.1 GB/s
Switch
Gb Ethernet
NP
HLT
Traffic
NP
50-100 Links
5.5-10 GB/s
SFC
50-100
SFCs
Event
Builder
NP
Switch
SFC
SFC
Event
Building
HLT Traffic
Mixed Traffic
Farm CPUs
U. Marconi
INFN, Bologna
~1200 CPUs
15
Scale of the System







From Monte-Carlo  number of hits per SubDetector
From Front-end electronics structure  Number of
hits per electronics board
From hit-encoding  Fragment size per electronics
board
Link speeds and link loads determine a suitable
multiplexing factor for the network processors
(taking into account transport headers and overheads
depending on switching network technology)
Event-building and striping-off headers + link speeds
(loads) give number of sub-farms
Desired total number of CPUs  CPUs per sub farm
Scenarios (which SDs, which switching technology…)
U. Marconi
INFN, Bologna
16
Architecture-1 L1 Baseline
Velo
TT
Level-0
Totals
Context
Number of FE Boards
Number of FE Links (Level-1)
Number of Hits (L1)
Numberof Hits/FE Board (L1)
L1
Aggregate
Data
Traffic
Network
Inputs
70% of the
link load
U. Marconi
INFN, Bologna
88
76
1188
13.5
48
48
461
9.6
1
1
1
1.0
2
32
2.43
35.20
2675.2
56
64
4.86
70.4
5350.4
2
24
1.15
26.40
1267.2
48
64
3.07
70.4
3379.2
88
92
0.09
101.20
101.2
116
116
0.12
127.6
127.6
35.2
70.4
1.50
3
2
52
152
167.2
83.6
67%
51
26.4
70.4
2.00
4
2
24
152
167.2
83.6
67%
24
101.2
127.6
0.50
1
2
1
148
162.8
81.4
65%
2
125
Level-1 Trigger
Bytes/Hit
L1 Fragment Size/FE Board [B] (Physics)
L1 Event Size [kB] (Physics)
L1 Data Rate/FE Board [MB/s] (Physics)
L1 Data Rate/SD [MB/s] (Physics)
L1 Fragment Size/FE Board [B] (Total)
L1 Fragment Size/FE Board [B] (Physical)
L1 Event Size [kB] (Total)
L1 Data Rate/FE Board [MB/s] (Total)
L1 Data Rate/SD [MB/s] (Total)
3.68
4044
8.05
8857
Readout Unit Layer (Level-1)
Data Rates/FE Board [MB/s](Physics)
Data Rates/FE Board [MB/s] (Total)
EB Multiplexing Factor
RU Input Links
RU Output Links
#NPs/SD
Fragment size after muxing
Data Rate after Muxing
Total Data Rate/link after Muxing
Link Load
Number of Input Links (Level-1)
77
M
U
L
T
I
P
L
E
X
E
R
17
77
Event-building


Merging all the fragment belonging to one event
All Readout Units (RUs) send frames to the same
destination based on the event number
 Static load balancing

The destination is a NP module, which





U. Marconi
INFN, Bologna
waits for all frames belonging to one event
concatenates them in the right order
strips off all unnecessary headers
sends the completely assembled events to the SFCs
handles a small amount of reverse direction traffic
18
The Event-builder
Level-1
Traffic
125-239
Links
1.1 MHz
8.8-16.9 GB/s
FE
FE
FE
FE
FE
FE
FE
FE
FE
Front-end Electronics
FE FE FE TRM
Switch
Switch
77-135 NPs
NP
NP
77-135 Links
6.4-13.6 GB/s
Storage
System
NP
Level-1 Traffic
Multiplexing
Layer
L1-Decision
24 NPs
24 Links
1.5 GB/s
Sorter
SFC
SFC
SFC
TFC
System
37-70 NPs
Switch
349
Links
40 kHz
2.3 GB/s
30 Switches
73-140 Links
7.9-15.1 GB/s
Switch
Gb Ethernet
NP
Readout Network
NP
HLT
Traffic
NP
50-100 Links
5.5-10 GB/s
SFC
50-100
SFCs
NP
Event
Builder
Switch
SFC
SFC
HLT Traffic
U. Marconi
INFN, Bologna
Mixed Traffic
Farm CPUs
19
~1200 CPUs
Event-Building and CPU Farm
Event Building
Baseline
RN Output Links
RN Output Link Rate [MB/s]
Fragment Rate (L1) per link [kHz]
Fragment Rate (HLT) per link [kHz]
Event Rate (L1) per Link [kHz]
Event Rate (HLT) per Link [kHz]
RN Output Link Rate (L1) [MB/s]
RN Output Link Rate (HLT) [MB/s]
Total EB Output Rate [MB/s]
EB NPs
73
109.1
577.6
8.8
15.1
0.548
55.8
20.4
76.2
37
Trigger Farms
U. Marconi
INFN, Bologna
Multiplexing Factor (Raw)
Multiplexing Factor
MUX Input Ports
MUX Ouput Ports
Resultant Output Rate [MB/s]
Muxing Switches
Subfarms
Subfarm Switches
Event Rate/Subfarm (L1) [kHz]
Event Rate/Subfarm (HLT) [kHz]
Processors/subfarm
Processors
Event Rate per Processor (L1) [kHz]
Event Rate per Processor (HLT) [kHz]
1.44
1.50
3
2
114.3
25
49
49
22.6
0.8
25
1225
0.90
0.03
Sub-Farms
20
Architecture-2
Level-1
Traffic
FE
FE
FE
FE
FE
FE
FE
FE
FE
Front-end Electronics
FE FE FE TRM
126-240
Links
44 kHz
5.5-11.0 GB/s
62-83 Switches
Switch
Switch
Switch
Switch
Switch
33 Links
1.7 GB/s
Sorter
TFC
System
90-153 Links
5.5-10 GB/s
Storage
System
SFC
SFC
SFC
90-153
SFCs
Switch Switch Switch
HLT Traffic
31 Switches
L1-Decision
Readout Network
Level-1 Traffic
349
Links
40 kHz
2.3 GB/s
Multiplexing
Layer
64-157
Links
88 kHz
Gb Ethernet
HLT
Traffic
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
SFC
SFC
Switch
Switch
CPU
~1400 CPUs
CPU
CPU
CPU
CPU
CPU
CPU
Farm
Mixed Traffic
U. Marconi
INFN, Bologna
No Network Processors
21
Architecture-2
Technicalities


More switch ports
Heavier load on SFC
 ~80 kHz fragment rate, i.e. 960 Mb/s in both I/O
 more Sub-farms
 Alleviation using interrupt coalescence (one interrupt per N
frames, buffering in input NIC)
• Feature of Gigabit Ethernet
 Unpacking events and distribution to farm CPUs
• using advanced DMA engines
• Longer transfer/event-building latency
(only relevant for Level-1)

Number of sub-farms and number of CPUs per subfarm
 Less CPUs per sub-farm than events in a packed super-event
 Concern with respect to unacceptable latency through
statistical fluctuations
U. Marconi
INFN, Bologna
22
Scale of the System
Architecture-2
Velo+TT+L0DU
+CaloTrigger
Number of CPUs (L1+HLT+Reconstruction)
..+IT+OT
1400
1400
7.2 GB/s
10.0 GB/s
Links from detector (Level-1)
126
240
Links from detector (HLT)
349
349
Input ports into network
97
190
Output ports from network
91
154
80 kHz
80 kHz
Maximum Level-1 latency
50 ms
50 ms
Number of events in 1 super-event (L1/HLT)
25/10
25/10
Average event size @ 1.1 MHz (Level-1)
4.8 kB
9.5 kB
Average event size @ 40 kHz (full read-out)
38 kB
38 kB
800 s
800s
Aggregated rate through network
(including all overheads)
Frame-rate at SFC
Mean CPU time for Level-1 algorithm
U. Marconi
INFN, Bologna
80KHz  960 Gbit/s
23
Responsibilities

SFC
 SFC is a high performance (2 Gigabit sustained I/O) PC

Sub Farm
 Farm nodes are disk-less, booted from network, running
Linux (Real Time Linux), rack-mounted PCs (1U dual CPU
motherboards) or blade servers
 Farm protocol over Gigabit Ethernet
 Timeout mechanisms
 Fault tolerance through error trapping of the software
trigger

System Simulation
U. Marconi
INFN, Bologna
24
Farm Issues





Scalable up to several thousand CPUs
Organized in sub-farms, which perform local
dynamic load balancing
Transport protocol based on raw IP
Allow concurrent seamless usage for L1 and HLT
algorithms, while prioritising L1 traffic wherever
possible
Interface to the throttle via Experiment Control
System (ECS) by a separate network
U. Marconi
INFN, Bologna
25
Simulations
Ptolemy Concurrent Modeling
and Design in Java
Clocked Trigger
at given rate
(7-12 kHz)
Ptolemy
Technicality
Event
Unpacking +
Load balancing
+ Event
Distribution
Event-Length from
(heuristic) Distribution
+Event Packing
5
10
Processing Time
Processing time
from Event
Length using
parametric
formula
Entries 1000000
Mean 8.029e+005
RMS
2.35e+006
4
10
3
10
102
10
U. Marconi
INFN, Bologna
1
0
20
40
60
80
100
Processing Time [ms]
120
26
Requests

SFC on a high-end server




2.4÷3.0 GHz PIV Xeon
Intel 875P chipset
550÷800 MHz FSB (Front Side Bus)
DNB (Dedicated Network Bus) 266MB/s
(2Gbit/s)
• To eliminate PCI bottleneck
 3 Gigabit Ethernet Interface
 2 GB RAM


2 Gigabit Ethernet 8 Port Switches
5 rack mountable 1U dual processor
motherboards (4 Farm nodes, 1
Transmitter node)
 2 Gigabit Ethernet Interface each

1 Rack Standard
U. Marconi
INFN, Bologna
27
SubFarm Controller
Intel 875P chipset
Interface
To
Gigabit
Ethernet
U. Marconi
INFN, Bologna
28
Costi
Blade
SFC
PC
SFC
RAM
PC
RAM
SFC
NIC
PC
NIC
Switch
Price
Euro
Dell
PIII 1,26GHz
133MHz 512K
P III 1,26GHz
133MHz 512K
2GB
2GB
2x1000 t
2x1000t
Integrati
(2x4Gb uplink)
17000
IBM
2 x Xeon 2 GHz
Xeon 2 GHz
2,5GB
2,5GB
2x1000 t
2x1000t
Integrati
(2x4Gb uplink)
24000
HP
Compaq
P III 1,4GHz
133MHz 512K
PIII 1,4GHz
133MHz 512K
2GB
2GB
2x1000+1x100
2x1000+
1x100
Integrati
(2x4Gb+2x4FE uplink)
29120
Dell
2 x Xeon 2 GHz
PIII 1,4GHz
133MHz 512K
2GB
2GB
3x1000
2x1000
2x8p Gigabit layer2
17938
IBM
2 x Xeon 2 GHz
PIV 2GHz
133MHz 512K
2GB
2GB
3x1000
2x1000
2x8p Gigabit layer2
17790
HP
Compaq
2 x Xeon
2,4GHz
533MHz bus
PIV 2,66GHz
133MHz 512K
2GB
2GB
3x1000
BaseT
2x1000
BaseT
2x8ports
Gigabit layer2
20000
Rack
U. Marconi
INFN, Bologna
IVA esclusa
29
Persone Coinvolte

Sezione di Bologna








G. Avoni
A. Carbone
D. Galli
U. Marconi
G. Peco
M. Piccinini
V. Vagnoni
Sezione di Milano
 T. Bellunato
 L. Carbone, P. Dini
U. Marconi
INFN, Bologna
Sezione di Ferrara
A. Gianoli
30