Download PowerPoint - (FALSE) 2002

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
BTeV-RTES Project
Very Lightweight Agents: VLAs
Daniel Mossé, Jae Oh,
Madhura Tamhankar, John Gross
Computer Science Department
University of Pittsburgh
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
Shameless plug
LARTES
IEEE Workshop on Large Scale Real-Time and Embedded Systems
In conjunction with
IEEE Real-Time Systems Symposium (RTSS 2002 is on Dec 3-5, 2002)
December 2, 2002
Austin, TX, USA
http://www.rtss.org/LARTES.html
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
BTeV Test Station
Collider detectors are about the size of a small
apartment building. Fermilab's two detectors-CDF
and DZero-are about four stories high, weighing
some 5,000 tons (10 million pounds) each. Particle
collisions occur in the middle of the detectors, which
are crammed with electronic instrumentation.
Each detector has about 800,000 individual
pathways for recording electronic data generated by
the particle collisions. Signals are carried over nearly
a thousand miles of wire and cable.
Information from FERMI National Accelerator Laboratory
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
L1/L2/L3 Trigger Overview
Information from FERMI National Accelerator Laboratory
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
System Characteristics
Software Perspective
Reconfigurable node allocation
L1 runs one physics application, severely time constrained
L2/L3 runs several physics applications, little time constraints
Multiple operating systems and differing processors
TI DSP BIOS, Linux, Windows?
Communication among system sections via fast network
Fault tolerance is essentially absent in embedded and RT systems
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
L1/L2/L3 Trigger Hierarchy
Regional L1 Manager (1)
TimeSys RT Linux
Regional Manager VLA
Global Manager
TimeSys RT Linux
Global Manager
VLA
Regional L2/L3
Manager (1)
TimeSys RT Linux
Regional Manager VLA
Crate Managers (20), TimeSys RT Linux,
Crate Manager VLA
Gigabit Ethernet
Gigabit Ethernet
Linux Nodes (320)
RH 8.x Linux
Low-Level VLA
Farmlet Managers (16)
TimeSys RT Linux
Farmlet Manager VLA
DSPs (8)
TI DSP BIOS
Low-Level VLA
Mossé, Pitt
Section Managers (8), RH 8.x Linux, Section
Manager VLA
Data Archive
External Level
BTeV Workshop
Nashville, Nov 15, 2002
Very Lightweight Agents (VLAs)
Proposed Solution: Very Lightweight Agent
Minimize footprint
Platform independence
Monitor hardware
Monitor software
Comprehensible source code
Communication with high-level software entity
Error prediction
Error logging and messaging
Schedule and priorities of test events
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLAs on L1 and L2/3 nodes
Level 1 Farm Nodes
Hardware
VLA
Mossé, Pitt
Level 2/3 Farm Nodes
OS Kernel
Hardware
OS Kernel
(DSP BIOS)
(Linux)
Physics
Application
Physics
Application
VLA
Physics
Application
Network
API
Network
API
L1 Manager Nodes
L2/L3 Manager Nodes
BTeV Workshop
Nashville, Nov 15, 2002
VLA Error Reporting
Level 1/2/3 Manager Nodes
Hardware
Linux
Kernel
DSP
ARMOR
VLA
VLA
Manager
Application
Network
API
To Network
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLA Error Prediction
Buffer overflow:
1. VLA message or application data input buffers may overflow
2. Messages or data lost in each case
3. Detection through monitoring fill rate and overflow condition
4. High fill rate indicative of
* high error rate, producing messages
* undersized data buffers
Throttled CPU:
1. Throttled from high temperature
2. Throttle by erroneous power saving feature
3. Causes missed deadlines due to low CPU speed
4. Potentially critical failure if L1 data not processed fast enough
Note the the CPU may be throttled on purpose
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLA Error Logging
Mossé, Pitt
Communication API
Packages info:
1. Message time
2. Operational data
3. Environmental data
4. Sensor values
5. App & OS error codes
6. Beam crossing ID
“15” Message Buffer
VLA
Message Buffer
TCP/IP
Ethernet
Communication API
Hardware Software
Failures
Failures
ARMOR
1. Reads messages
2. Stores/uses for error
prediction
3. Appends appropriate info
4. Sends to archive
FILTERS
Data Archive
BTeV Workshop
Nashville, Nov 15, 2002
VLA Scheduling Issues
L1 trigger application has highest priority
VLA must run sufficiently to ensure efficacy of purpose
VLA must internally prioritize error tests
VLA must preempt the L1 trigger app on critical errors
Task priorities must be alterable during run-time
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLA Scheduling Issues
Physics Application
VLA
Kernel
VLA
Physics Application
VLA
Kernel
VLA
Normal
Scheduling
When physics app is unexpectedly
ended, more VLAs can be scheduled
Kernel
VLA
VLA
VLA
VLA
Physics
Application
VLA
Kernel
VLA
Adaptive
Resource
Scheduling
Physics Application
VLA has ability to control its own priority and that
of other apps, based on internal decision making
Mossé, Pitt
Kernel
VLA
Physics Application
BTeV Workshop
Kernel
Physics
Application
VLA
Alternative
Scheduling
Concept
Nashville, Nov 15, 2002
VLA Scheduling Issues
External Message
Source
(FPGA)
VLA
Inhibitor
Kernel
No
VLA
Physics
Application
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLA Status
• Current Status
–
–
–
–
VLA skeleton and timing implemented in Syracuse (poster)
Hardware platform from Vandy
Software (muon application) from Fermi and UIUC
Linux drivers to use GME and Vandy devkit
• Near term
– Muon application to run on the dsp board
– Muon application timing
– Instantiate VLAs with Vandy hardware and Muon application
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
VLA and Network Usage
• Network usage influences amount of data dropped by
Triggers and other Filters
• Network usage typically not considered in load
balancing algorithms (assume network is fast enough)
• VLAs monitor and report network usage
• Agents use this information to re-distribute loads
• Network architecture to control flows on a per-process
basis (http://www.netnice.org)
Mossé, Pitt
BTeV Workshop
Nashville, Nov 15, 2002
Related documents