Download Event Filter Farms - Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Distributed operating system wikipedia , lookup

Transcript
CERN - European Laboratory for Particle Physics
Javier Jaen Martinez
CERN IT/PDP
LHC - 28 September 1999
CERN - European Laboratory for Particle Physics
Table of Contents






Motivation & Goals
Types of Farms
Core Issues
Examples
JMX: A Management Technology
Summary
Event Filter Farms
LHC - 28 September 1999
2
CERN - European Laboratory for Particle Physics
Study Goals




How are Farms evolving in non HEP
environments?
Have Generic PC Farms and Filter Farms
shared requirements for system/application
monitoring, control and management?
Will we benefit from future developments in
other domains?
Which are the emerging technologies for farm
computing?
Event Filter Farms
LHC - 28 September 1999
3
CERN - European Laboratory for Particle Physics
Introduction

According to Pfister there are three ways to
improve performance
Work harder

Work smarter
Get Help
In terms of computing technologies
• work harder ~ using faster hardware
• work smarter ~ using more efficient algorithms and
techniques
• getting help ~ depending on how processors, memory
and interconnect are laid out: MPP, SMP, Distributed
Systems and Farms
Event Filter Farms
LHC - 28 September 1999
4
CERN - European Laboratory for Particle Physics
Motivation



IT/PDP is already using commodity farms
All 4 LHC experiments will use Event Filter
Farms
Commodity Farms are also becoming very
popular for non HEP applications
Event Filter Farms
LHC - 28 September 1999
5
CERN - European Laboratory for Particle Physics
Motivation
1000’s tasks and 1000’s of nodes to be controlled
monitored and managed (system and application
management challenge).
Event Filter Farms
LHC - 28 September 1999
6
CERN - European Laboratory for Particle Physics
Types of Farms

In our domain
• Event Filter Farms
– To filter data acquired in previous levels of a DAQ
– Reduce aggregated throughput by rejecting uninteresting
events or by compressing them
........
Event Building
SFI
EFU
Event Filter Farms
SFI
SFI
PE
PE
PE
PE
.
.
.
.
.
.
PE
EFU
PE
SFI
P
E
P
E
.
.
.
EFU
P
E
LHC - 28 September 1999
P
E
P
E
...
.
.
.
EFU
P
E
7
CERN - European Laboratory for Particle Physics
Types of Farms
• Batch Data Processing
– Job reads data from tape process information and writes
back data
– Each job runs on a separate node
– Job management performed by a batch scheduler
– Nodes with good CPU performance and large disks
– Good connectivity to mass storage
– Inter-node communication not critical (independent jobs)
• Interactive Data Analysis
–
–
–
–
–
Event Filter Farms
Analysis and data mining
Traverse large databases as fast as possible
Programs may run in parallel
Nodes with great CPU performance and large disks
High performance inter-process communication
LHC - 28 September 1999
8
CERN - European Laboratory for Particle Physics
Types of farms
• Montecarlo Simulation
– Used to simulate detectors
– Simulation jobs run independently on each node
– Similar to a batch data processing system (maybe with
less disk requirements)
• Others
–
–
–
–
Event Filter Farms
Workgroup Services
Central Data Recording Farms
Disk server Farms,
...
LHC - 28 September 1999
9
CERN - European Laboratory for Particle Physics
Types of farms

In non HEP environments
• High Performance Farms (Parallel)
– a collection of interconnected stand-alone computers
cooperatively working together as a single, integrated
computing resource
– Farm seen as a computer architecture for parallel
computation
• High Availability Farms
– Mission Critical Applications
– Hot Standby
– Failover and Failback
Event Filter Farms
LHC - 28 September 1999
10
CERN - European Laboratory for Particle Physics
Key Issues in Farm Computing
Event Filter Farms

Size Scalability (physical & application)

Enhanced Availability (failure management)

Single System Image (look-and-feel of one system)

Fast Communication (networks & protocols)

Load Balancing (CPU, Net, Memory, Disk)

Security and Encryption (farm of farms)

Distributed Environment (Social issues)

Manageability (admin. and control)

Programmability (offered API)

Applicability (farm-aware and non-aware app.)
LHC - 28 September 1999
11
CERN - European Laboratory for Particle Physics
Core Issues (Maturity)
M
Load Balancing
o
n
i
Failure
t
SSI
Management
o
r
i
n
Manageability
g
Event Filter Farms
“Mature”
Development
Fast Communication
LHC - 28 September 1999
Future
Challenge
12
CERN - European Laboratory for Particle Physics
Monitoring… why?

Performance Tuning:
• Environment changes dynamically due to the variable load
on the system and the network.
• improving or maintaining the quality of the services
according to those changes
• Exists a reactive control monitoring that acts on farm
parameters to obtain desired performance

Fault Recovery:
• to know the source of any failure in order to improve
robustness and reliability.
• automatic fault recovery service needed in farms with
hundreds of nodes (migration, …)

Security:
• to detect and report security violation events
Event Filter Farms
LHC - 28 September 1999
13
CERN - European Laboratory for Particle Physics
Monitoring… Why?

Performance Evaluation:
• to evaluate applications/system performance at run-time.
• Evaluation is performed off-line with data monitored on-line

Testing:
• to check correctness of new applications running in a farm
by
– detecting erroneous or incorrect operations
– obtaining activity reports of certain functions of the farm
– obtaining a complete history of the farm in a given period of
time
Event Filter Farms
LHC - 28 September 1999
14
CERN - European Laboratory for Particle Physics
Monitoring Types
Generation
Instrumentation
Collection
Traces generation
Pull/Push
Distrib/Central.
Time/Event
Collection Format
Processing
Traces merging
database updating
correlation
filtering
Online/Offline
On Demand/Autom
Storage Format
Dissemin.
Presentat.
Users
Managers
Control Systems
Dissem. Format
Access Type
Access Control
Demand/Auto
Present. Format
How Many Monitoring tools are available
Event Filter Farms
LHC - 28 September 1999
15
CERN - European Laboratory for Particle Physics
Monitoring Tools
Maple.
Cheops
NetLogger
Ganymede
SAS.
NextPoint
MTR
MeasureNet
Network health
ResponseNetworks
http://www.slac.stanford.edu/~cottrell/tcom/nmtf.html
No Integrated tools for services, applications, devices, network
monitoring
Event Filter Farms
LHC - 28 September 1999
16
CERN - European Laboratory for Particle Physics
Monitoring … Strategies?

Define common strategies:
•
•
•
•

What to be monitored?
Collection strategies
Processing alternatives
Displaying techniques
Obtain Modular implementations
• Good example ATLAS Back End Software

IT Division has started a monitoring
project
• Integrated monitoring
• Service Oriented
Event Filter Farms
LHC - 28 September 1999
17
CERN - European Laboratory for Particle Physics
Fast Communication
Killer Platform
ns
ms
Comm..
Software
Comm.
Software
Comm.
Software
Comm.
Software
Network
Interface
Hardware
Network
Interface
Hardware
Network
Interface
Hardware
Network
Interface
Hardware
µs


°°°
Killer Switch
Fast processors and fast networks
The time is spent in crossing between them
Event Filter Farms
LHC - 28 September 1999
18
CERN - European Laboratory for Particle Physics
Fast Communication



Remove the kernel from critical path
Offer to user applications a fully protected,
virtual, direct (zero copy send messages), userlevel access to the network interface
This idea has been specified in VIA (Virtual
Interface Architecture)
Application
High Level Comm. Lib (MPI, ShM Put/Get, PVM)
Send/Recv/RDMA
Buff Manag./Synchro
VI Kernel Agent
VI Network Adapter
Event Filter Farms
LHC - 28 September 1999
19
CERN - European Laboratory for Particle Physics
Fast Communication

VIA’s predecesors
• Active Messages (Berkeley Now project, Fast Sockets)
• Fast Messages (UCSD MPI, Shmem Put/Get, Global
Arrays)


Applications using sockets, MPI, ShMem, … can
benefit from these fast communication layers
Several Farms (HPVM (FM), NERSC PC cluster
(M-VIA), …) already benefit from this technology
Event Filter Farms
LHC - 28 September 1999
20
100
77.1 MB/s
1,000
100
10
10
Bandwidth (MB/s)
10,000
Latency (µs)
CERN - European Laboratory for Particle Physics
Fast Communication (Fast Mess)
11.1µs
FM packet size
1
1
4
16
64
256
1K
4K
16K
64K
Message size (bytes)
Event Filter Farms
LHC - 28 September 1999
21
CERN - European Laboratory for Particle Physics
Fast Communication
Worse
Better
Better
Worse
HPVM
Pwr. Chal.
SP-2
T3E
Origin 2K
Beowulf
0
50 100 150 200 250 300
0
Bandwidth (MB/s)
Event Filter Farms
50
100 150
200 250
One-way latency (µs)
LHC - 28 September 1999
22
CERN - European Laboratory for Particle Physics
Single System Image




A single system image is the illusion, created by
software or hardware, that presents a collection of
resources as one, more powerful resource.
Strong SSI results in farms appearing like a single
machine to the user, to applications, and to the
network.
The SSI level is a good measure of the coupling
degree of the nodes in a farm
Every farm has a certain degree of SSI (A farm with
no SSI at all is not a farm).
Event Filter Farms
LHC - 28 September 1999
23
CERN - European Laboratory for Particle Physics
Benefits of Single System Image







Usage of system resources transparently
Transparent process migration and load balancing
across nodes.
Improved reliability and higher availability
Improved system response time and performance
Simplified system management
Reduction in the risk of operator errors
User need not be aware of the underlying system
architecture to use these machines effectively
(C) from Jain
Event Filter Farms
LHC - 28 September 1999
24
CERN - European Laboratory for Particle Physics
SSI Services







Single Entry Point
Single File Hierarchy: xFS, AFS, ...
Single Control Point: Management from single GUI
Single memory space
Single Job Management: Glunix, Codine, LSF
Single User Interface: Like workstation/PC
windowing environment
Single I/O Space (SIO):
• any node can access any peripheral or disk devices
without the knowledge of physical location.
Event Filter Farms
LHC - 28 September 1999
25
CERN - European Laboratory for Particle Physics
SSI Services

Single Process Space (SPS)
• Any process on any node create process with cluster wide
process wide and they communicate through signal, pipes,
etc, as if they are one a single node.
•Every SSI has a boundary
•Single system support can exist at different levels
• OS Level: MOSIX
• Middleware:Codine,PVM
•Application Level: Monitoring App, Back-End SW
Event Filter Farms
LHC - 28 September 1999
26
CERN - European Laboratory for Particle Physics
Scheduling Software

Goal:
enables the scheduling of system activities and
execution of applications while offering high availability
services transparently

Usually works completely outside the kernel and
on top of machines existing operating system

Advantages:
•
•
•
•
Event Filter Farms
Load Balancing
Use spare CPU cycles
Provide Fault tolerance
In practice, increased and reliable throughput of user
applications
LHC - 28 September 1999
27
CERN - European Laboratory for Particle Physics
SS: Generalities

The workings of a typical SS:
• Create a job description file: job name, resources,
desired platform, …
• Job description file is sent by the client software to a
master scheduler
• The master scheduler has an overall view: queues that
have been configured plus the computational load of
the nodes in the farm
• The master ensures that the resources being used are
load balanced and ensures that jobs complete
sucessfully
Event Filter Farms
LHC - 28 September 1999
28
CERN - European Laboratory for Particle Physics
SS: Main features

Application Support:
• are batch, interactive and parallel jobs supported?
• multiple configurable queues?

Job Scheduling and allocation
• Allocation Policy: taking into account system load,
CPU type, computational load, memory, disk space, …
• Checkpointing:save state at regular intervals during
job execution. Job an be restarted from last checkpoint
• Migration: move job to another node in the farm to
achieve dynamic load balancing or perform a sequence
of activities on different specialized nodes
• Monitoring/ Suspension/Resumption
Event Filter Farms
LHC - 28 September 1999
29
CERN - European Laboratory for Particle Physics
SS: Main features

Dynamics of resources
• Resources, queues, and nodes reconfigured
dynamically
• Existence of Single points of failure
• Fault tolerance: re-run a job if system crashes and
check for needed resources
Event Filter Farms
LHC - 28 September 1999
30
CERN - European Laboratory for Particle Physics
SS:Packages
Research
Commercial
CCS
Condor
Dynamic Network Queueing System
Distributed Queueing System
Generic NQS
Portable Batch System
Prospero Resource Manager
MOSIX
Far
Dynamite
Codine (Genias)
LoadBalancer (Tivoli)
LSF (Platform)
Network Queueing Environment (SGI)
TaskBroker (HP)
NQS
Condor
NQE
PBS
Event Filter Farms
DNQS Utopia
DQS
Codine
LSF
LHC - 28 September 1999
31
CERN - European Laboratory for Particle Physics
SS: Some examples

CODINE & LSF
•
•
•
•
•
•
•
•
•
•
•
•
Event Filter Farms
to be used in large heterogeneous networked env.
Dynamic and static load balancing
Batch, interactive, parallel jobs
Checkpointing & Migration
Offers API for new distributed applications
No single Point of failure
Job accounting data and analysis tools
Modification of resource reservation for started jobs
and specification of releasable shared resources (LSF)
MPI (LSF) vs MPI, PVM, Express, Linda (Codine)
Reporting tools (LSF)
C API (LSF), ?? (Codine)
No Checkpointing of forked jobs or signaled jobs
LHC - 28 September 1999
32
CERN - European Laboratory for Particle Physics
Failure Management



Traditionally associated to Scheduling Sw and
oriented to long running processes (CPU intensive)
If a CPU intensive process crashes --> wasted CPU
Solution:
• Save the state of the process periodically
• In case of failure process restarted from last checkpoint

Strategies:
• store checkpoints in files using a distributed file system
(slows down computation, NFS is poor, AFS caching of
Checkpoints may flush other useful data)
• checkpoint servers (dedicated node with disk storage and
management functions for checkpointing)
Event Filter Farms
LHC - 28 September 1999
33
CERN - European Laboratory for Particle Physics
Failure Management

Levels:
• Transparent checkpointing: checkpointing library linked
against an executable binary. The library checkpoints
transparently the process (condor, libckpt, Hector)
• User directed Checkpointing (directives included in the
application’s code to perform specific checkpoints of
particular memory segments)

Future challenges:
• Decoupling Failure management and scheduling
• Define strategies for System failure recovery (at kernel
level?)
• Define strategies for task failure recovery
Event Filter Farms
LHC - 28 September 1999
34
CERN - European Laboratory for Particle Physics
Examples: MOSIX Farms




MOSIX = Multicomputer OS for UNIX
An OS module (layer) that provides the
applications with the illusion of working on a single
system
Remote operations are performed like local
operations
Strong SSI at kernel level
Event Filter Farms
LHC - 28 September 1999
35
CERN - European Laboratory for Particle Physics
Example: MOSIX Farms
Preemptive process migration that can
migrate--->any process, anywhere, anytime



Supervised by distributed algorithms that
respond on-line to global resource availability transparently
Load-balancing - migrate process from overloaded to under-loaded nodes
Memory ushering - migrate processes from a
node that has exhausted its memory, to prevent
paging/swapping
Event Filter Farms
LHC - 28 September 1999
36
CERN - European Laboratory for Particle Physics
Example: MOSIX Farms

A scalable cluster configuration:
• 50 Pentium-II 300 MHz
• 38 Pentium-Pro 200 MHz (some are SMPs)
• 16 Pentium-II 400 MHz (some are SMPs)



Over 12 GB cluster-wide RAM
Connected by the Myrinet 2.56 G.b/s LAN
Runs Red-Hat 6.0, based on Kernel 2.2.7
Download MOSIX:
• http://www.mosix.cs.huji.ac.il/
Event Filter Farms
LHC - 28 September 1999
37
CERN - European Laboratory for Particle Physics
Example: HPVM Farms




GOAL: Obtain Supercomputing performance
from a pile of PCs
Scalability: 256 processors demonstrated
Networking over Myrinet interconnect
OS: LINUX and NT (going NT)
Winsock 2
MPI
HPF
CORBA
SHMEM
Global
Arrays
 Available now
 Under
development
Illinois Fast Messages (FM)
Event Filter Farms
LHC - 28 September 1999
38
CERN - European Laboratory for Particle Physics
Example: HPVM Farms

SSI at middleware level:
• MPI, and LSF



Fast Communication:Fast Messages
Monitoring: none yet
Manageability (still poor):
• HPVM front-end (Java applet + LSF features)
• Symera (under development at NCSA)
–
–
–
–

DCOM based management tool (only for NT)
Add/remove node from cluster
logical cluster definition
distributed processes control + monitoring
Other: NERSC PC Cluster and Beowulf
Event Filter Farms
LHC - 28 September 1999
39
CERN - European Laboratory for Particle Physics
Example: Disk server Farms


To transfer data sets between disk and applications.
IT/PDP
• RFIO package (optimize large sequential data transfers)
• each disk server system runs one master RFIO daemon in
the background and a new requests lead to the spawning
of further RFIO daemons.
• Memory space is used for caching
• SSI: Weak
– Load balancing of rfio daemons in different nodes of the farm
– Single memory space + I/O space could be useful in a disk server
farm with heterogeneous machines
Event Filter Farms
LHC - 28 September 1999
40
CERN - European Laboratory for Particle Physics
Example: Disk server Farms
• Monitoring:
– RFIO daemons status, load of farm nodes, memory usage,
caching hit rates,...
• Fast Messaging: rfio techniques using TCP sockets
• Manageability: storage, daemons, caching management
• Linux based disk servers performance is now comparable to UNIX
disk servers (benchmarking study by Bernd Panzer IT/PDP)!!!!

DPSS (Distributed Parallel Storage Server)
• collection of disk servers which operate in parallel over a wide
area network to provide logical block level access to large data
sets
• SSI:
– applications are not aware of declustered data.
– Load balancing if replicated data
• Monitoring: Java Agents Monitoring and Management
• Fast Messaging: Dynamic TCP buffer size adjustment
Event Filter Farms
LHC - 28 September 1999
41
CERN - European Laboratory for Particle Physics
JMX: A Management Technology

JMX: Java Management
Extensions (Basics):
• defines a management
architecture, APIs, and
management services all
under a single specification
• resources can be made
manageable without regards
as to how its manager is
implemented (SNMP, Corba,
Java Manager)
• Based on Dynamic Agents
• Platform and Protocol
independent
• JDMK 3.2
Event Filter Farms
LHC - 28 September 1999
Management
Applic
Manager Level
(JMX Manager)
Agent Level
(JMX Agent)
Instrumentation
Level
(JMX Resource)
Managed
Resource
42
CERN - European Laboratory for Particle Physics
JMX: Components
Event Filter Farms
LHC - 28 September 1999
43
CERN - European Laboratory for Particle Physics
JMX: Applications




Implement distributed SNMP monitoring
infrastructures
Heterogeneus farms (NT+Linux) management
Environments where Management “Intelligence” or
requirements change over time
Environments where Management Clients maybe
implemented using different technologies.
Event Filter Farms
LHC - 28 September 1999
44
CERN - European Laboratory for Particle Physics
Summary



Farms scale and intended use will grow in the next
years
We presented a set of factors to compare different
farm computing approaches
Developments from non HEP domains can be used in
HEP farms:
• Fast Networking
• Monitoring
• System Management

However Application and tasks Management is very
dependant on particular domains
Event Filter Farms
LHC - 28 September 1999
45
CERN - European Laboratory for Particle Physics
Summary

EFF community should:
• Share common experiences (specific subfields in future
meetings)
• Define common monitoring requirements and mechanisms,
SSI requirements, management procedures (filtering,
reconstruction, compression, …)
• Follow on developments in management of High
Performance computing farms (same challenge of
management of thousand’s of processes/threads)
• Obtain if possible modular implementations of these
requirements that constitute EFF Management Approach
Event Filter Farms
LHC - 28 September 1999
46