Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
L1&HLT Trigger Network Research and Development Umberto Marconi INFN Bologna 1 Overview Introduction The network trigger architectures Architecture-1 based on Network Processors Architecture-2 based on event packing at the L1 front-end boards and IP R&D activity plans Requests for funding U. Marconi INFN, Bologna 2 History The LHCb DAQ system designed for 40KHz trigger rate and 100KB event size Physically and logically separated L1 trigger system running at 1.1MHz (L0 trigger accept rate) Gigabit Ethernet as link technology throughout the system Frame handling and event building rely on Network Processors CPU farm for trigger processing Re-optimisation of the detector… LHCb Online System TDR, CERN/LHCC 2001-040, December 2001 U. Marconi INFN, Bologna 3 Redesign of LHCb New tracking System More elaborate trigger system U. Marconi INFN, Bologna 4 LHCb Trigger U. Marconi INFN, Bologna 5 TDR DAQ LHCb Detector VELO Level 1 Trigger Variable latency <2 ms ECAL HCAL MUON RICH 40 MHz 40 TB/s 1 MHz 40 kHz Level-0 Timing L0 & Fast Control L1 (TFC) Front-End Electronics 1 TB/s Level-1 LAN Level 0 Trigger Fixed latency 4.0 s TRACK Data Rates 1 MHz FEM Front-End Multiplexers (FEM) 4 GB/s Throttle Front-End Links RU RU L1 buffer 2ms max latency Farm RU Read-out Network (RN) SFC SFC Sub-Farm Controllers (SFC) CPU CPU CPU CPU Control & Monitoring (ECS) Trigger Level 2 & 3 Event Filter Variable latency L2 ~10 ms L3 ~200 ms U. Marconi INFN, Bologna Readout Units Read-out Units (RU) 4 GB/s Readout Network 40 MB/s Storage 6 New Requirements for L1&HLT L1 aggregate data traffic @ 1.1.MHz Event Size: 8.8 KB(VELO+TT+L0DU) 16KB (+IT+OT) 80 Gb/s 136 Gb/s HLT aggregate data traffic @ 40 KHz Event Size: 30 KB 9.6 Gb/s L1 L1 max processing latency ~50ms The amount of data to be moved for L1 dominates over the HLT U. Marconi INFN, Bologna 7 Architectures Architecture-1 Level-1 Traffic 125-239 Links 1.1 MHz 8.8-16.9 GB/s FE FE FE FE FE FE FE FE FE Front-end Electronics FE FE FE TRM Switch Switch 77-135 NPs NP NP 77-135 Links 6.4-13.6 GB/s Storage System Level-1 Traffic NP Readout Network NP NP SFC SFC SFC NP 50-100 Links 5.5-10 GB/s SFC 50-100 SFCs HLT Traffic Multiplexing Layer L1-Decision 24 NPs 62-83 Switches FE FE FE FE FE Switch Switch 349 Links 40 kHz 2.3 GB/s Switch Switch Switch L1-Decision Readout Network SFC SFC SFC 90-153 SFCs Switch Switch Switch Level-1 Traffic 33 Links 1.7 GB/s Sorter TFC System 90-153 Links 5.5-10 GB/s Storage System Gb Ethernet 31 Switches Multiplexing Layer Sorter SFC FE 64-157 Links 88 kHz Switch SFC FE 24 Links 1.5 GB/s Event Builder NP FE HLT Traffic Front-end Electronics FE FE FE TRM 126-240 Links 44 kHz 5.5-11.0 GB/s HLT Traffic Mixed Traffic CPU CPU CPU CPU CPU CPU CPU CPU CPU SFC SFC Switch Switch CPU ~1400 CPUs CPU CPU CPU CPU CPU CPU Farm Mixed Traffic Farm CPUs ~1200 CPUs Network Processor IBM NP4GS3 4 full-duplex Gigabit Ethernet ports U. Marconi INFN, Bologna FE 349 Links 40 kHz 2.3 GB/s TFC System 37-70 NPs Switch HLT Traffic 30 Switches 73-140 Links 7.9-15.1 GB/s Switch Gb Ethernet Architecture-2 Level-1 Traffic No Network Processors IP Event Building at the SFC Gigabit Ethernet UTP Version 1000Base-T Cat5e unshielded twisted pairs 8 Ethernet Constraint U. Marconi INFN, Bologna 9 N:M Multiplexing N input, M output links Reduce output rate by factor 1/M Increase the max aggregated payload (@70% load): M=2: 149 B M=3: 236 B M=4: 324 B Network Processor U. Marconi INFN, Bologna 10 Protocols Pure push-through protocol no horizontal synchronization Vertical synchronization through arrival of data Buffer overflow protection via throttle • Central Buffer Monitoring in TFC (Readout Supervisor) – Zero Suppression in FE guaranteed within 20s Level-1 Buffer can be centrally monitored • NPs will drive throttle signals (L0 throttle or L1 throttle) • SFCs throttle through ECS (latency ~ms) Level-1 Latency control through hierarchically graded U. Marconi INFN, Bologna Timeout on CPUs Timeout on SFC Timeout on Decision Sorter Timeout on Readout Supervisor 11 Real Time Operation Linux 2.5 kernel has real time capabilities (preemptive kernel): Realtime priorities: the L1 task will never be interrupted until it finishes The context switch latency on nowadays CPUs is low: 10.1 ± 0.2 µs The scheme of running both tasks concurrently is sound U. Marconi INFN, Bologna 12 Bit Error Rate (BER) LHCb is based 1000 BaseT, because of cost reasons Gigabit Ethernet is specified to work over UTP CAT5e cables (1000 BaseT) The BER is defined to be < 10-11 one bad packet per 100 s, however real equipment is expected to be much better. BER depends not only on the cable, but particularly also on the end-points U. Marconi INFN, Bologna 13 Architecture-2 vs. Architecture-1 What is the gain getting rid of the NP ? Concern* about using a NP since the line is going to be stopped by IBM • All of the NP would ever be needed (incl. spares, upgrades, etc.) would have to be bought very soon (large investment) No need to design and build the NP-based modules Fully commercial (commodity) system (Switches, CPUs) • Large switches/routers will not be a problem in 2005 • System can grow by adding switch ports and SFCs Scaling behavior easier Features have to be moved into the L1 Front-End Electronics Reduce protocol overheads and fragment rate by event packing Protocol adaptations (FE speaks IP) Destination assignment • FE or Readout Supervisor together with FE (*)L1&HLT review CERN 29/4/03 U. Marconi INFN, Bologna 14 Architecture-1 Level-1 Traffic 125-239 Links 1.1 MHz 8.8-16.9 GB/s FE FE FE FE FE FE FE FE FE Front-end Electronics FE FE FE TRM Switch Switch 77-135 NPs NP NP 77-135 Links 6.4-13.6 GB/s Storage System Readout Network NP NP Level-1 Traffic Multiplexing Layer L1-Decision 24 NPs 24 Links 1.5 GB/s SFC SFC SFC Frame Handling Sorter TFC System 37-70 NPs Switch 349 Links 40 kHz 2.3 GB/s 30 Switches 73-140 Links 7.9-15.1 GB/s Switch Gb Ethernet NP HLT Traffic NP 50-100 Links 5.5-10 GB/s SFC 50-100 SFCs Event Builder NP Switch SFC SFC Event Building HLT Traffic Mixed Traffic Farm CPUs U. Marconi INFN, Bologna ~1200 CPUs 15 Scale of the System From Monte-Carlo number of hits per SubDetector From Front-end electronics structure Number of hits per electronics board From hit-encoding Fragment size per electronics board Link speeds and link loads determine a suitable multiplexing factor for the network processors (taking into account transport headers and overheads depending on switching network technology) Event-building and striping-off headers + link speeds (loads) give number of sub-farms Desired total number of CPUs CPUs per sub farm Scenarios (which SDs, which switching technology…) U. Marconi INFN, Bologna 16 Architecture-1 L1 Baseline Velo TT Level-0 Totals Context Number of FE Boards Number of FE Links (Level-1) Number of Hits (L1) Numberof Hits/FE Board (L1) L1 Aggregate Data Traffic Network Inputs 70% of the link load U. Marconi INFN, Bologna 88 76 1188 13.5 48 48 461 9.6 1 1 1 1.0 2 32 2.43 35.20 2675.2 56 64 4.86 70.4 5350.4 2 24 1.15 26.40 1267.2 48 64 3.07 70.4 3379.2 88 92 0.09 101.20 101.2 116 116 0.12 127.6 127.6 35.2 70.4 1.50 3 2 52 152 167.2 83.6 67% 51 26.4 70.4 2.00 4 2 24 152 167.2 83.6 67% 24 101.2 127.6 0.50 1 2 1 148 162.8 81.4 65% 2 125 Level-1 Trigger Bytes/Hit L1 Fragment Size/FE Board [B] (Physics) L1 Event Size [kB] (Physics) L1 Data Rate/FE Board [MB/s] (Physics) L1 Data Rate/SD [MB/s] (Physics) L1 Fragment Size/FE Board [B] (Total) L1 Fragment Size/FE Board [B] (Physical) L1 Event Size [kB] (Total) L1 Data Rate/FE Board [MB/s] (Total) L1 Data Rate/SD [MB/s] (Total) 3.68 4044 8.05 8857 Readout Unit Layer (Level-1) Data Rates/FE Board [MB/s](Physics) Data Rates/FE Board [MB/s] (Total) EB Multiplexing Factor RU Input Links RU Output Links #NPs/SD Fragment size after muxing Data Rate after Muxing Total Data Rate/link after Muxing Link Load Number of Input Links (Level-1) 77 M U L T I P L E X E R 17 77 Event-building Merging all the fragment belonging to one event All Readout Units (RUs) send frames to the same destination based on the event number Static load balancing The destination is a NP module, which U. Marconi INFN, Bologna waits for all frames belonging to one event concatenates them in the right order strips off all unnecessary headers sends the completely assembled events to the SFCs handles a small amount of reverse direction traffic 18 The Event-builder Level-1 Traffic 125-239 Links 1.1 MHz 8.8-16.9 GB/s FE FE FE FE FE FE FE FE FE Front-end Electronics FE FE FE TRM Switch Switch 77-135 NPs NP NP 77-135 Links 6.4-13.6 GB/s Storage System NP Level-1 Traffic Multiplexing Layer L1-Decision 24 NPs 24 Links 1.5 GB/s Sorter SFC SFC SFC TFC System 37-70 NPs Switch 349 Links 40 kHz 2.3 GB/s 30 Switches 73-140 Links 7.9-15.1 GB/s Switch Gb Ethernet NP Readout Network NP HLT Traffic NP 50-100 Links 5.5-10 GB/s SFC 50-100 SFCs NP Event Builder Switch SFC SFC HLT Traffic U. Marconi INFN, Bologna Mixed Traffic Farm CPUs 19 ~1200 CPUs Event-Building and CPU Farm Event Building Baseline RN Output Links RN Output Link Rate [MB/s] Fragment Rate (L1) per link [kHz] Fragment Rate (HLT) per link [kHz] Event Rate (L1) per Link [kHz] Event Rate (HLT) per Link [kHz] RN Output Link Rate (L1) [MB/s] RN Output Link Rate (HLT) [MB/s] Total EB Output Rate [MB/s] EB NPs 73 109.1 577.6 8.8 15.1 0.548 55.8 20.4 76.2 37 Trigger Farms U. Marconi INFN, Bologna Multiplexing Factor (Raw) Multiplexing Factor MUX Input Ports MUX Ouput Ports Resultant Output Rate [MB/s] Muxing Switches Subfarms Subfarm Switches Event Rate/Subfarm (L1) [kHz] Event Rate/Subfarm (HLT) [kHz] Processors/subfarm Processors Event Rate per Processor (L1) [kHz] Event Rate per Processor (HLT) [kHz] 1.44 1.50 3 2 114.3 25 49 49 22.6 0.8 25 1225 0.90 0.03 Sub-Farms 20 Architecture-2 Level-1 Traffic FE FE FE FE FE FE FE FE FE Front-end Electronics FE FE FE TRM 126-240 Links 44 kHz 5.5-11.0 GB/s 62-83 Switches Switch Switch Switch Switch Switch 33 Links 1.7 GB/s Sorter TFC System 90-153 Links 5.5-10 GB/s Storage System SFC SFC SFC 90-153 SFCs Switch Switch Switch HLT Traffic 31 Switches L1-Decision Readout Network Level-1 Traffic 349 Links 40 kHz 2.3 GB/s Multiplexing Layer 64-157 Links 88 kHz Gb Ethernet HLT Traffic CPU CPU CPU CPU CPU CPU CPU CPU CPU SFC SFC Switch Switch CPU ~1400 CPUs CPU CPU CPU CPU CPU CPU Farm Mixed Traffic U. Marconi INFN, Bologna No Network Processors 21 Architecture-2 Technicalities More switch ports Heavier load on SFC ~80 kHz fragment rate, i.e. 960 Mb/s in both I/O more Sub-farms Alleviation using interrupt coalescence (one interrupt per N frames, buffering in input NIC) • Feature of Gigabit Ethernet Unpacking events and distribution to farm CPUs • using advanced DMA engines • Longer transfer/event-building latency (only relevant for Level-1) Number of sub-farms and number of CPUs per subfarm Less CPUs per sub-farm than events in a packed super-event Concern with respect to unacceptable latency through statistical fluctuations U. Marconi INFN, Bologna 22 Scale of the System Architecture-2 Velo+TT+L0DU +CaloTrigger Number of CPUs (L1+HLT+Reconstruction) ..+IT+OT 1400 1400 7.2 GB/s 10.0 GB/s Links from detector (Level-1) 126 240 Links from detector (HLT) 349 349 Input ports into network 97 190 Output ports from network 91 154 80 kHz 80 kHz Maximum Level-1 latency 50 ms 50 ms Number of events in 1 super-event (L1/HLT) 25/10 25/10 Average event size @ 1.1 MHz (Level-1) 4.8 kB 9.5 kB Average event size @ 40 kHz (full read-out) 38 kB 38 kB 800 s 800s Aggregated rate through network (including all overheads) Frame-rate at SFC Mean CPU time for Level-1 algorithm U. Marconi INFN, Bologna 80KHz 960 Gbit/s 23 Responsibilities SFC SFC is a high performance (2 Gigabit sustained I/O) PC Sub Farm Farm nodes are disk-less, booted from network, running Linux (Real Time Linux), rack-mounted PCs (1U dual CPU motherboards) or blade servers Farm protocol over Gigabit Ethernet Timeout mechanisms Fault tolerance through error trapping of the software trigger System Simulation U. Marconi INFN, Bologna 24 Farm Issues Scalable up to several thousand CPUs Organized in sub-farms, which perform local dynamic load balancing Transport protocol based on raw IP Allow concurrent seamless usage for L1 and HLT algorithms, while prioritising L1 traffic wherever possible Interface to the throttle via Experiment Control System (ECS) by a separate network U. Marconi INFN, Bologna 25 Simulations Ptolemy Concurrent Modeling and Design in Java Clocked Trigger at given rate (7-12 kHz) Ptolemy Technicality Event Unpacking + Load balancing + Event Distribution Event-Length from (heuristic) Distribution +Event Packing 5 10 Processing Time Processing time from Event Length using parametric formula Entries 1000000 Mean 8.029e+005 RMS 2.35e+006 4 10 3 10 102 10 U. Marconi INFN, Bologna 1 0 20 40 60 80 100 Processing Time [ms] 120 26 Requests SFC on a high-end server 2.4÷3.0 GHz PIV Xeon Intel 875P chipset 550÷800 MHz FSB (Front Side Bus) DNB (Dedicated Network Bus) 266MB/s (2Gbit/s) • To eliminate PCI bottleneck 3 Gigabit Ethernet Interface 2 GB RAM 2 Gigabit Ethernet 8 Port Switches 5 rack mountable 1U dual processor motherboards (4 Farm nodes, 1 Transmitter node) 2 Gigabit Ethernet Interface each 1 Rack Standard U. Marconi INFN, Bologna 27 SubFarm Controller Intel 875P chipset Interface To Gigabit Ethernet U. Marconi INFN, Bologna 28 Costi Blade SFC PC SFC RAM PC RAM SFC NIC PC NIC Switch Price Euro Dell PIII 1,26GHz 133MHz 512K P III 1,26GHz 133MHz 512K 2GB 2GB 2x1000 t 2x1000t Integrati (2x4Gb uplink) 17000 IBM 2 x Xeon 2 GHz Xeon 2 GHz 2,5GB 2,5GB 2x1000 t 2x1000t Integrati (2x4Gb uplink) 24000 HP Compaq P III 1,4GHz 133MHz 512K PIII 1,4GHz 133MHz 512K 2GB 2GB 2x1000+1x100 2x1000+ 1x100 Integrati (2x4Gb+2x4FE uplink) 29120 Dell 2 x Xeon 2 GHz PIII 1,4GHz 133MHz 512K 2GB 2GB 3x1000 2x1000 2x8p Gigabit layer2 17938 IBM 2 x Xeon 2 GHz PIV 2GHz 133MHz 512K 2GB 2GB 3x1000 2x1000 2x8p Gigabit layer2 17790 HP Compaq 2 x Xeon 2,4GHz 533MHz bus PIV 2,66GHz 133MHz 512K 2GB 2GB 3x1000 BaseT 2x1000 BaseT 2x8ports Gigabit layer2 20000 Rack U. Marconi INFN, Bologna IVA esclusa 29 Persone Coinvolte Sezione di Bologna G. Avoni A. Carbone D. Galli U. Marconi G. Peco M. Piccinini V. Vagnoni Sezione di Milano T. Bellunato L. Carbone, P. Dini U. Marconi INFN, Bologna Sezione di Ferrara A. Gianoli 30