Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BTeV-RTES Project Very Lightweight Agents: VLAs Daniel Mossé, Jae Oh, Madhura Tamhankar, John Gross Computer Science Department University of Pittsburgh Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 Shameless plug LARTES IEEE Workshop on Large Scale Real-Time and Embedded Systems In conjunction with IEEE Real-Time Systems Symposium (RTSS 2002 is on Dec 3-5, 2002) December 2, 2002 Austin, TX, USA http://www.rtss.org/LARTES.html Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 BTeV Test Station Collider detectors are about the size of a small apartment building. Fermilab's two detectors-CDF and DZero-are about four stories high, weighing some 5,000 tons (10 million pounds) each. Particle collisions occur in the middle of the detectors, which are crammed with electronic instrumentation. Each detector has about 800,000 individual pathways for recording electronic data generated by the particle collisions. Signals are carried over nearly a thousand miles of wire and cable. Information from FERMI National Accelerator Laboratory Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 L1/L2/L3 Trigger Overview Information from FERMI National Accelerator Laboratory Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 System Characteristics Software Perspective Reconfigurable node allocation L1 runs one physics application, severely time constrained L2/L3 runs several physics applications, little time constraints Multiple operating systems and differing processors TI DSP BIOS, Linux, Windows? Communication among system sections via fast network Fault tolerance is essentially absent in embedded and RT systems Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 L1/L2/L3 Trigger Hierarchy Regional L1 Manager (1) TimeSys RT Linux Regional Manager VLA Global Manager TimeSys RT Linux Global Manager VLA Regional L2/L3 Manager (1) TimeSys RT Linux Regional Manager VLA Crate Managers (20), TimeSys RT Linux, Crate Manager VLA Gigabit Ethernet Gigabit Ethernet Linux Nodes (320) RH 8.x Linux Low-Level VLA Farmlet Managers (16) TimeSys RT Linux Farmlet Manager VLA DSPs (8) TI DSP BIOS Low-Level VLA Mossé, Pitt Section Managers (8), RH 8.x Linux, Section Manager VLA Data Archive External Level BTeV Workshop Nashville, Nov 15, 2002 Very Lightweight Agents (VLAs) Proposed Solution: Very Lightweight Agent Minimize footprint Platform independence Monitor hardware Monitor software Comprehensible source code Communication with high-level software entity Error prediction Error logging and messaging Schedule and priorities of test events Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLAs on L1 and L2/3 nodes Level 1 Farm Nodes Hardware VLA Mossé, Pitt Level 2/3 Farm Nodes OS Kernel Hardware OS Kernel (DSP BIOS) (Linux) Physics Application Physics Application VLA Physics Application Network API Network API L1 Manager Nodes L2/L3 Manager Nodes BTeV Workshop Nashville, Nov 15, 2002 VLA Error Reporting Level 1/2/3 Manager Nodes Hardware Linux Kernel DSP ARMOR VLA VLA Manager Application Network API To Network Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLA Error Prediction Buffer overflow: 1. VLA message or application data input buffers may overflow 2. Messages or data lost in each case 3. Detection through monitoring fill rate and overflow condition 4. High fill rate indicative of * high error rate, producing messages * undersized data buffers Throttled CPU: 1. Throttled from high temperature 2. Throttle by erroneous power saving feature 3. Causes missed deadlines due to low CPU speed 4. Potentially critical failure if L1 data not processed fast enough Note the the CPU may be throttled on purpose Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLA Error Logging Mossé, Pitt Communication API Packages info: 1. Message time 2. Operational data 3. Environmental data 4. Sensor values 5. App & OS error codes 6. Beam crossing ID “15” Message Buffer VLA Message Buffer TCP/IP Ethernet Communication API Hardware Software Failures Failures ARMOR 1. Reads messages 2. Stores/uses for error prediction 3. Appends appropriate info 4. Sends to archive FILTERS Data Archive BTeV Workshop Nashville, Nov 15, 2002 VLA Scheduling Issues L1 trigger application has highest priority VLA must run sufficiently to ensure efficacy of purpose VLA must internally prioritize error tests VLA must preempt the L1 trigger app on critical errors Task priorities must be alterable during run-time Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLA Scheduling Issues Physics Application VLA Kernel VLA Physics Application VLA Kernel VLA Normal Scheduling When physics app is unexpectedly ended, more VLAs can be scheduled Kernel VLA VLA VLA VLA Physics Application VLA Kernel VLA Adaptive Resource Scheduling Physics Application VLA has ability to control its own priority and that of other apps, based on internal decision making Mossé, Pitt Kernel VLA Physics Application BTeV Workshop Kernel Physics Application VLA Alternative Scheduling Concept Nashville, Nov 15, 2002 VLA Scheduling Issues External Message Source (FPGA) VLA Inhibitor Kernel No VLA Physics Application Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLA Status • Current Status – – – – VLA skeleton and timing implemented in Syracuse (poster) Hardware platform from Vandy Software (muon application) from Fermi and UIUC Linux drivers to use GME and Vandy devkit • Near term – Muon application to run on the dsp board – Muon application timing – Instantiate VLAs with Vandy hardware and Muon application Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002 VLA and Network Usage • Network usage influences amount of data dropped by Triggers and other Filters • Network usage typically not considered in load balancing algorithms (assume network is fast enough) • VLAs monitor and report network usage • Agents use this information to re-distribute loads • Network architecture to control flows on a per-process basis (http://www.netnice.org) Mossé, Pitt BTeV Workshop Nashville, Nov 15, 2002