Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Process and Data Flow Control in KLOE E. Pasqualucci (INFN - Roma) [email protected] Outline • System overview • Process structure and local communication • SNMP and remote communication • Process control • Data Flow Control system • DFC monitor DAQ system architecture ~ 23000 FEE channels @ 2.5 kHz f + bckg (~10 kHz) Bandwidth: ~ 50 Mbytes/s (5 Kbyte/ev.) Storage: 200 Tbyte/y F C V D P I D U C I R O C K M A A A R V U D D O I ... X C C C C M 16 1 K VIC Tested with peak rates of 10 kHz in multibunches mode. Tested at maximum required throughput using no zero suppressed calorimeter data Trigger chain A A A R V U D D O I ... X C C C C M 16 1 K A A A R V U D D O I ... X C C C C M 16 1 K DFC system A A A R V U D D O I ... X C C C C M 16 1 K VIC F C V D P I D U C I F C V V D P I I D U C C I R O C K M ... FDDI Run Control FDDI Switch Monitor System CPU server CBUS ... Storage system CPU server F C V V D P I I D U C C I R O C K M Level-2 crates DAQ software organization Level 1 chain DFC system Data Map data Messages Traps VME Chain tools Collector simulation CmdSrv Level 2 GeoVme map Circ Monitor system Sender SpyBuff dmap RSpyD Didone FDDI switch Receiver SpyD RunCtl Circ Builder Recorder CmdSrv SlowCtl system Farm Circ (Ybos) To Disk/Tape Farm status Spy dump Process structure – Msg Q creation – Shmem subscription – Shmem space allocation for variables • Main Loop – Process Event – Process Command – Idle time • Interrupt Handler – Extract command from Msg Q. Id Contents Mapping Process number Header Pointer to 1st process Proc. 1 Pointer to 2nd process Processes • Initialization Process name Process id Message queue id Process status Last command Last command status Number of variables Variable 1 Variable 2 ….. Proc. 2 Pointer to 3rd process ….. All Local communication • Getting a variable: Id Header Pointer to 1st process Locates the process Gets its id and message Q Puts command to Q Sends an interrupt Polls on command status – The receiver: • Reads the Q • Writes the command and status and executes it • Writes the command status (acknowledgement) Proc. 1 Pointer to 2nd process My process Process name Process id id My process My Message Q id queue id Process status Stop Last command ! Executing Last command status Success Number of variables Variable 1 Variable 2 My variable = value ….. Proc. 2 Pointer to 3rd process ….. Processes • Sending a command: • • • • • Mapping Process number • Locate process • Locate variable – The sender: Contents signal My process Stop Q ! All Managing the DAQ network • SNMP (Simple Network Management Protocol) • Largely used to manage network devices • Defined as a standard by the IETP (Internet Engineering Task Force) • Implemented using a reliable UDP protocol • Used to retrieve and/or set information about : – – – – network configuration traffic faults accounting • Managed objects defined in a Manager Information Base (MIB) defined by IETP • Private extensions of the standard MIB are allowed • Public domain software, allows the implementation of : – dedicated agents – utilities for remote access SNMP client-server policy • MIB – Variables organized as a tree • Primitives: – get, get-next, set • Each device runs a daemon able to: – Understand MIB requests – Obtain required information – Execute required actions • Trap mechanism • KLOE uses SNMP to: – – – – Control DAQ devices and network Implement message distribution Implement process control Implement Data Flow Control (DFC) The command server and the KLOE MIB sub-tree iso.org.dod.internet.mgmt.mib-2 system(1) KLOE(13) sysServices(7) sysDescr(1) sysLocation(6) sysObjectID(2) sysName(5) sysUpTime(3) sysContact(4) kprocNumber(1) kprocVarTable(3) kprocTable(2) kprocEntry(1) …. kprocesses(1) kprocVarEntry(1) kprocVarValue(n,6) kprocVarProcIndex(n,1) kProcVarIndex(n,2) kprocVarNumber(8) kprocVarName(n,3) kprocIndex(1) kprocName(2) kprocId(3) kprocLastCommandStatus(7) kprocLastCommand(6) kprocMsgQId(4) kprocStatus(5) kprocVarType(n,5) kprocVarSize(n,4) Message system implementation Node A Run Control Node B locate process send command SNMP ack Command Server put command second ack req INT second ack get process variables Msg Q Shared Memory write last command and status executing execute command write command status (success, fault) DAQ Process get command Remarks and performance • Command server – DAQ process • receives commands and shares variables – Command distributor • Run and process control tools – tcl/tk commands implemented • get variable, send message – Fortran interface for old fashioned software – Portable • AIX, OSF1, HP-UX, Solaris, Linux, LynxOS supported • Optimized library – Parallel message distribution implemented • Performance • Local command ~1.2 ms • Remote variable reading ~1.2 ms • Remote command completion ~4 ms Production process control command command + start trap signal check pcd OffCtl Control node cmdsrv Production node locpc Shmem (variables) Proc_2 Proc_1 DAQ system architecture A A A R V U D D O I ... X C C C C M 16 1 K A A A R V U D D O I ... X C C C C M 16 1 K VIC A A A R V U D D O I ... X C C C C M 16 1 K Trigger chain F C V D P I D U C I R O C K M DFC system A A A R V U D D O I ... X C C C C M 16 1 K VIC F C V D P I D U C I F C V V D P I I D U C C I R O C K M ... FDDI Run Control FDDI Switch Monitor System CPU server CBUS ... Storage system CPU server F C V V D P I I D U C C I R O C K M Level-2 crates The DFC System • Changes the packet distribution sequence – Avoids slow-down in data transmission and blocking timeouts • Keeps latency under control DFC status TS VIC bus shmem Flow table Network and trigger stat Performance stat Statistics Commands Traps Flow table data DFCd DFC Flow table latmon Collector Receiver RunCtl Receiver protocol • • • • Receives event sub-packets through the GigaSwitch Put packets into multiple circular buffer Implements DFC and LatMon farm interface Dynamic thresholds 0.5 MB/s TCP/IP on FDDI ... 0.5 MB/s 0.5 MB/s Select and copy sub-event packets If last # arrived ... To LatMon Get max occupancy If “full” Send trap “full” If “empty” after “full” To DFC system EVB (1) EVB (n) Send trap “empty” Send LatMon trap (#) DFC Protocol DFC data in VME shared memory • Initialization: – Wait for “trap” – On trap (full/empty): – Sends auto-test traps N. of RECV nodes IP addresses Flags 111111…1111 00 trigger Validity Flags 111101…1111 0 ... • Reads the last trigger number from Trigger Supervisor • Creates next table • Modifies the validity of the previous table DFC map • Main Loop: Max number of tables Flow tables – Builds Network Map – Builds DFC map (ordered list of RECV IP addresses) – Creates the first table with Infinity Trigger number validity DFC algorithm and performance • Validity: – v = t0 + (ttr + (tdfc + ksdfc))*(n + ksn) + t • k=5 – autotest • DFCd reaction time (trap): – 1.2 ms • DFC reaction time: – – – – tlocal ~ 1.2 ms trigger interaction ~6-7 ms tdfc ~ O(10-2) ms total 10 ms • DFC-L2 interaction rate: – ~ 1 table / 50 ms (sustained) • DFC “dead time” implemented The DFC status monitor Packet latency • Latency measurements: – SNMP traps sent to LatMon: • Collector trap when the packet # is released for sender • Receiver trap when all the sub-packets # arrived • Test for receiver’s buffers Summary • A fast and reliable message system has been implemented using standard UNIX mechanisms and the SNMP protocol • Very simple to use – process template + command definition – fortran and tcl/tk interface • Allows full process control • A Data Flow Control system has been developed using message system and SNMP traps • It allows to redirect network traffic taking into account the dynamics of the whole system • Dynamic redefinition of thresholds • It successfully ran during KLOE data acquisition