Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CERN - European Laboratory for Particle Physics Javier Jaen Martinez CERN IT/PDP LHC - 28 September 1999 CERN - European Laboratory for Particle Physics Table of Contents Motivation & Goals Types of Farms Core Issues Examples JMX: A Management Technology Summary Event Filter Farms LHC - 28 September 1999 2 CERN - European Laboratory for Particle Physics Study Goals How are Farms evolving in non HEP environments? Have Generic PC Farms and Filter Farms shared requirements for system/application monitoring, control and management? Will we benefit from future developments in other domains? Which are the emerging technologies for farm computing? Event Filter Farms LHC - 28 September 1999 3 CERN - European Laboratory for Particle Physics Introduction According to Pfister there are three ways to improve performance Work harder Work smarter Get Help In terms of computing technologies • work harder ~ using faster hardware • work smarter ~ using more efficient algorithms and techniques • getting help ~ depending on how processors, memory and interconnect are laid out: MPP, SMP, Distributed Systems and Farms Event Filter Farms LHC - 28 September 1999 4 CERN - European Laboratory for Particle Physics Motivation IT/PDP is already using commodity farms All 4 LHC experiments will use Event Filter Farms Commodity Farms are also becoming very popular for non HEP applications Event Filter Farms LHC - 28 September 1999 5 CERN - European Laboratory for Particle Physics Motivation 1000’s tasks and 1000’s of nodes to be controlled monitored and managed (system and application management challenge). Event Filter Farms LHC - 28 September 1999 6 CERN - European Laboratory for Particle Physics Types of Farms In our domain • Event Filter Farms – To filter data acquired in previous levels of a DAQ – Reduce aggregated throughput by rejecting uninteresting events or by compressing them ........ Event Building SFI EFU Event Filter Farms SFI SFI PE PE PE PE . . . . . . PE EFU PE SFI P E P E . . . EFU P E LHC - 28 September 1999 P E P E ... . . . EFU P E 7 CERN - European Laboratory for Particle Physics Types of Farms • Batch Data Processing – Job reads data from tape process information and writes back data – Each job runs on a separate node – Job management performed by a batch scheduler – Nodes with good CPU performance and large disks – Good connectivity to mass storage – Inter-node communication not critical (independent jobs) • Interactive Data Analysis – – – – – Event Filter Farms Analysis and data mining Traverse large databases as fast as possible Programs may run in parallel Nodes with great CPU performance and large disks High performance inter-process communication LHC - 28 September 1999 8 CERN - European Laboratory for Particle Physics Types of farms • Montecarlo Simulation – Used to simulate detectors – Simulation jobs run independently on each node – Similar to a batch data processing system (maybe with less disk requirements) • Others – – – – Event Filter Farms Workgroup Services Central Data Recording Farms Disk server Farms, ... LHC - 28 September 1999 9 CERN - European Laboratory for Particle Physics Types of farms In non HEP environments • High Performance Farms (Parallel) – a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource – Farm seen as a computer architecture for parallel computation • High Availability Farms – Mission Critical Applications – Hot Standby – Failover and Failback Event Filter Farms LHC - 28 September 1999 10 CERN - European Laboratory for Particle Physics Key Issues in Farm Computing Event Filter Farms Size Scalability (physical & application) Enhanced Availability (failure management) Single System Image (look-and-feel of one system) Fast Communication (networks & protocols) Load Balancing (CPU, Net, Memory, Disk) Security and Encryption (farm of farms) Distributed Environment (Social issues) Manageability (admin. and control) Programmability (offered API) Applicability (farm-aware and non-aware app.) LHC - 28 September 1999 11 CERN - European Laboratory for Particle Physics Core Issues (Maturity) M Load Balancing o n i Failure t SSI Management o r i n Manageability g Event Filter Farms “Mature” Development Fast Communication LHC - 28 September 1999 Future Challenge 12 CERN - European Laboratory for Particle Physics Monitoring… why? Performance Tuning: • Environment changes dynamically due to the variable load on the system and the network. • improving or maintaining the quality of the services according to those changes • Exists a reactive control monitoring that acts on farm parameters to obtain desired performance Fault Recovery: • to know the source of any failure in order to improve robustness and reliability. • automatic fault recovery service needed in farms with hundreds of nodes (migration, …) Security: • to detect and report security violation events Event Filter Farms LHC - 28 September 1999 13 CERN - European Laboratory for Particle Physics Monitoring… Why? Performance Evaluation: • to evaluate applications/system performance at run-time. • Evaluation is performed off-line with data monitored on-line Testing: • to check correctness of new applications running in a farm by – detecting erroneous or incorrect operations – obtaining activity reports of certain functions of the farm – obtaining a complete history of the farm in a given period of time Event Filter Farms LHC - 28 September 1999 14 CERN - European Laboratory for Particle Physics Monitoring Types Generation Instrumentation Collection Traces generation Pull/Push Distrib/Central. Time/Event Collection Format Processing Traces merging database updating correlation filtering Online/Offline On Demand/Autom Storage Format Dissemin. Presentat. Users Managers Control Systems Dissem. Format Access Type Access Control Demand/Auto Present. Format How Many Monitoring tools are available Event Filter Farms LHC - 28 September 1999 15 CERN - European Laboratory for Particle Physics Monitoring Tools Maple. Cheops NetLogger Ganymede SAS. NextPoint MTR MeasureNet Network health ResponseNetworks http://www.slac.stanford.edu/~cottrell/tcom/nmtf.html No Integrated tools for services, applications, devices, network monitoring Event Filter Farms LHC - 28 September 1999 16 CERN - European Laboratory for Particle Physics Monitoring … Strategies? Define common strategies: • • • • What to be monitored? Collection strategies Processing alternatives Displaying techniques Obtain Modular implementations • Good example ATLAS Back End Software IT Division has started a monitoring project • Integrated monitoring • Service Oriented Event Filter Farms LHC - 28 September 1999 17 CERN - European Laboratory for Particle Physics Fast Communication Killer Platform ns ms Comm.. Software Comm. Software Comm. Software Comm. Software Network Interface Hardware Network Interface Hardware Network Interface Hardware Network Interface Hardware µs °°° Killer Switch Fast processors and fast networks The time is spent in crossing between them Event Filter Farms LHC - 28 September 1999 18 CERN - European Laboratory for Particle Physics Fast Communication Remove the kernel from critical path Offer to user applications a fully protected, virtual, direct (zero copy send messages), userlevel access to the network interface This idea has been specified in VIA (Virtual Interface Architecture) Application High Level Comm. Lib (MPI, ShM Put/Get, PVM) Send/Recv/RDMA Buff Manag./Synchro VI Kernel Agent VI Network Adapter Event Filter Farms LHC - 28 September 1999 19 CERN - European Laboratory for Particle Physics Fast Communication VIA’s predecesors • Active Messages (Berkeley Now project, Fast Sockets) • Fast Messages (UCSD MPI, Shmem Put/Get, Global Arrays) Applications using sockets, MPI, ShMem, … can benefit from these fast communication layers Several Farms (HPVM (FM), NERSC PC cluster (M-VIA), …) already benefit from this technology Event Filter Farms LHC - 28 September 1999 20 100 77.1 MB/s 1,000 100 10 10 Bandwidth (MB/s) 10,000 Latency (µs) CERN - European Laboratory for Particle Physics Fast Communication (Fast Mess) 11.1µs FM packet size 1 1 4 16 64 256 1K 4K 16K 64K Message size (bytes) Event Filter Farms LHC - 28 September 1999 21 CERN - European Laboratory for Particle Physics Fast Communication Worse Better Better Worse HPVM Pwr. Chal. SP-2 T3E Origin 2K Beowulf 0 50 100 150 200 250 300 0 Bandwidth (MB/s) Event Filter Farms 50 100 150 200 250 One-way latency (µs) LHC - 28 September 1999 22 CERN - European Laboratory for Particle Physics Single System Image A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource. Strong SSI results in farms appearing like a single machine to the user, to applications, and to the network. The SSI level is a good measure of the coupling degree of the nodes in a farm Every farm has a certain degree of SSI (A farm with no SSI at all is not a farm). Event Filter Farms LHC - 28 September 1999 23 CERN - European Laboratory for Particle Physics Benefits of Single System Image Usage of system resources transparently Transparent process migration and load balancing across nodes. Improved reliability and higher availability Improved system response time and performance Simplified system management Reduction in the risk of operator errors User need not be aware of the underlying system architecture to use these machines effectively (C) from Jain Event Filter Farms LHC - 28 September 1999 24 CERN - European Laboratory for Particle Physics SSI Services Single Entry Point Single File Hierarchy: xFS, AFS, ... Single Control Point: Management from single GUI Single memory space Single Job Management: Glunix, Codine, LSF Single User Interface: Like workstation/PC windowing environment Single I/O Space (SIO): • any node can access any peripheral or disk devices without the knowledge of physical location. Event Filter Farms LHC - 28 September 1999 25 CERN - European Laboratory for Particle Physics SSI Services Single Process Space (SPS) • Any process on any node create process with cluster wide process wide and they communicate through signal, pipes, etc, as if they are one a single node. •Every SSI has a boundary •Single system support can exist at different levels • OS Level: MOSIX • Middleware:Codine,PVM •Application Level: Monitoring App, Back-End SW Event Filter Farms LHC - 28 September 1999 26 CERN - European Laboratory for Particle Physics Scheduling Software Goal: enables the scheduling of system activities and execution of applications while offering high availability services transparently Usually works completely outside the kernel and on top of machines existing operating system Advantages: • • • • Event Filter Farms Load Balancing Use spare CPU cycles Provide Fault tolerance In practice, increased and reliable throughput of user applications LHC - 28 September 1999 27 CERN - European Laboratory for Particle Physics SS: Generalities The workings of a typical SS: • Create a job description file: job name, resources, desired platform, … • Job description file is sent by the client software to a master scheduler • The master scheduler has an overall view: queues that have been configured plus the computational load of the nodes in the farm • The master ensures that the resources being used are load balanced and ensures that jobs complete sucessfully Event Filter Farms LHC - 28 September 1999 28 CERN - European Laboratory for Particle Physics SS: Main features Application Support: • are batch, interactive and parallel jobs supported? • multiple configurable queues? Job Scheduling and allocation • Allocation Policy: taking into account system load, CPU type, computational load, memory, disk space, … • Checkpointing:save state at regular intervals during job execution. Job an be restarted from last checkpoint • Migration: move job to another node in the farm to achieve dynamic load balancing or perform a sequence of activities on different specialized nodes • Monitoring/ Suspension/Resumption Event Filter Farms LHC - 28 September 1999 29 CERN - European Laboratory for Particle Physics SS: Main features Dynamics of resources • Resources, queues, and nodes reconfigured dynamically • Existence of Single points of failure • Fault tolerance: re-run a job if system crashes and check for needed resources Event Filter Farms LHC - 28 September 1999 30 CERN - European Laboratory for Particle Physics SS:Packages Research Commercial CCS Condor Dynamic Network Queueing System Distributed Queueing System Generic NQS Portable Batch System Prospero Resource Manager MOSIX Far Dynamite Codine (Genias) LoadBalancer (Tivoli) LSF (Platform) Network Queueing Environment (SGI) TaskBroker (HP) NQS Condor NQE PBS Event Filter Farms DNQS Utopia DQS Codine LSF LHC - 28 September 1999 31 CERN - European Laboratory for Particle Physics SS: Some examples CODINE & LSF • • • • • • • • • • • • Event Filter Farms to be used in large heterogeneous networked env. Dynamic and static load balancing Batch, interactive, parallel jobs Checkpointing & Migration Offers API for new distributed applications No single Point of failure Job accounting data and analysis tools Modification of resource reservation for started jobs and specification of releasable shared resources (LSF) MPI (LSF) vs MPI, PVM, Express, Linda (Codine) Reporting tools (LSF) C API (LSF), ?? (Codine) No Checkpointing of forked jobs or signaled jobs LHC - 28 September 1999 32 CERN - European Laboratory for Particle Physics Failure Management Traditionally associated to Scheduling Sw and oriented to long running processes (CPU intensive) If a CPU intensive process crashes --> wasted CPU Solution: • Save the state of the process periodically • In case of failure process restarted from last checkpoint Strategies: • store checkpoints in files using a distributed file system (slows down computation, NFS is poor, AFS caching of Checkpoints may flush other useful data) • checkpoint servers (dedicated node with disk storage and management functions for checkpointing) Event Filter Farms LHC - 28 September 1999 33 CERN - European Laboratory for Particle Physics Failure Management Levels: • Transparent checkpointing: checkpointing library linked against an executable binary. The library checkpoints transparently the process (condor, libckpt, Hector) • User directed Checkpointing (directives included in the application’s code to perform specific checkpoints of particular memory segments) Future challenges: • Decoupling Failure management and scheduling • Define strategies for System failure recovery (at kernel level?) • Define strategies for task failure recovery Event Filter Farms LHC - 28 September 1999 34 CERN - European Laboratory for Particle Physics Examples: MOSIX Farms MOSIX = Multicomputer OS for UNIX An OS module (layer) that provides the applications with the illusion of working on a single system Remote operations are performed like local operations Strong SSI at kernel level Event Filter Farms LHC - 28 September 1999 35 CERN - European Laboratory for Particle Physics Example: MOSIX Farms Preemptive process migration that can migrate--->any process, anywhere, anytime Supervised by distributed algorithms that respond on-line to global resource availability transparently Load-balancing - migrate process from overloaded to under-loaded nodes Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping Event Filter Farms LHC - 28 September 1999 36 CERN - European Laboratory for Particle Physics Example: MOSIX Farms A scalable cluster configuration: • 50 Pentium-II 300 MHz • 38 Pentium-Pro 200 MHz (some are SMPs) • 16 Pentium-II 400 MHz (some are SMPs) Over 12 GB cluster-wide RAM Connected by the Myrinet 2.56 G.b/s LAN Runs Red-Hat 6.0, based on Kernel 2.2.7 Download MOSIX: • http://www.mosix.cs.huji.ac.il/ Event Filter Farms LHC - 28 September 1999 37 CERN - European Laboratory for Particle Physics Example: HPVM Farms GOAL: Obtain Supercomputing performance from a pile of PCs Scalability: 256 processors demonstrated Networking over Myrinet interconnect OS: LINUX and NT (going NT) Winsock 2 MPI HPF CORBA SHMEM Global Arrays Available now Under development Illinois Fast Messages (FM) Event Filter Farms LHC - 28 September 1999 38 CERN - European Laboratory for Particle Physics Example: HPVM Farms SSI at middleware level: • MPI, and LSF Fast Communication:Fast Messages Monitoring: none yet Manageability (still poor): • HPVM front-end (Java applet + LSF features) • Symera (under development at NCSA) – – – – DCOM based management tool (only for NT) Add/remove node from cluster logical cluster definition distributed processes control + monitoring Other: NERSC PC Cluster and Beowulf Event Filter Farms LHC - 28 September 1999 39 CERN - European Laboratory for Particle Physics Example: Disk server Farms To transfer data sets between disk and applications. IT/PDP • RFIO package (optimize large sequential data transfers) • each disk server system runs one master RFIO daemon in the background and a new requests lead to the spawning of further RFIO daemons. • Memory space is used for caching • SSI: Weak – Load balancing of rfio daemons in different nodes of the farm – Single memory space + I/O space could be useful in a disk server farm with heterogeneous machines Event Filter Farms LHC - 28 September 1999 40 CERN - European Laboratory for Particle Physics Example: Disk server Farms • Monitoring: – RFIO daemons status, load of farm nodes, memory usage, caching hit rates,... • Fast Messaging: rfio techniques using TCP sockets • Manageability: storage, daemons, caching management • Linux based disk servers performance is now comparable to UNIX disk servers (benchmarking study by Bernd Panzer IT/PDP)!!!! DPSS (Distributed Parallel Storage Server) • collection of disk servers which operate in parallel over a wide area network to provide logical block level access to large data sets • SSI: – applications are not aware of declustered data. – Load balancing if replicated data • Monitoring: Java Agents Monitoring and Management • Fast Messaging: Dynamic TCP buffer size adjustment Event Filter Farms LHC - 28 September 1999 41 CERN - European Laboratory for Particle Physics JMX: A Management Technology JMX: Java Management Extensions (Basics): • defines a management architecture, APIs, and management services all under a single specification • resources can be made manageable without regards as to how its manager is implemented (SNMP, Corba, Java Manager) • Based on Dynamic Agents • Platform and Protocol independent • JDMK 3.2 Event Filter Farms LHC - 28 September 1999 Management Applic Manager Level (JMX Manager) Agent Level (JMX Agent) Instrumentation Level (JMX Resource) Managed Resource 42 CERN - European Laboratory for Particle Physics JMX: Components Event Filter Farms LHC - 28 September 1999 43 CERN - European Laboratory for Particle Physics JMX: Applications Implement distributed SNMP monitoring infrastructures Heterogeneus farms (NT+Linux) management Environments where Management “Intelligence” or requirements change over time Environments where Management Clients maybe implemented using different technologies. Event Filter Farms LHC - 28 September 1999 44 CERN - European Laboratory for Particle Physics Summary Farms scale and intended use will grow in the next years We presented a set of factors to compare different farm computing approaches Developments from non HEP domains can be used in HEP farms: • Fast Networking • Monitoring • System Management However Application and tasks Management is very dependant on particular domains Event Filter Farms LHC - 28 September 1999 45 CERN - European Laboratory for Particle Physics Summary EFF community should: • Share common experiences (specific subfields in future meetings) • Define common monitoring requirements and mechanisms, SSI requirements, management procedures (filtering, reconstruction, compression, …) • Follow on developments in management of High Performance computing farms (same challenge of management of thousand’s of processes/threads) • Obtain if possible modular implementations of these requirements that constitute EFF Management Approach Event Filter Farms LHC - 28 September 1999 46