Download talk

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Thoughts on a Java Reference
Implementation for MPJ
Mark Baker*, Bryan Carpenter
*University of Portsmouth
Florida
State University
IPDPS, Cancun, Mexico – 5th May 2000
http://www.dcs.port.ac.uk/~mab/Talks/
24 May 2017
[email protected]
Contents



Introduction
Some design decisions
An overview of the architecture





Process creation and monitoring
The MPJ daemon
Handling aborts and failures
MPJ device
Conclusions and future work
24 May, 2017
[email protected]
Introduction




The Message-Passing Working Group of the
Java Grande Forum was formed in late 1998
as a response to the appearance of several
prototype Java bindings for MPI-like libraries.
An initial draft for a common API specification
was distributed at Supercomputing '98.
Since then the working group has met in San
Francisco and Syracuse.
The present API is now called MPJ.
24 May, 2017
[email protected]
Introduction





No complete implementation of the draft
specification.
mpiJava, is moving towards the “standard”.
The new version (1.2) of the software supports
direct communication of objects via object
serialization,
Version 1.3 of mpiJava will implement the new
API.
The mpiJava wrappers rely on the availability of
platform-dependent native MPI implementation for
the target computer.
24 May, 2017
[email protected]
Introduction

While this is a reasonable basis in many cases, the
approach has some disadvantages.



The 2-stage installation procedure – get and build
native MPI then install and match the Java wrappers –
tedious/off-putting to new users.
On several occasions we saw conflicts between the JVM
environment and the native MPI runtime behaviour. The
situation has improved, and mpiJava now runs on
various combinations of JVM and MPI implementation.
This strategy simply conflicts with the ethos of Java –
write-once-run-anywhere software is the order of the
day.
24 May, 2017
[email protected]
MPJ – the Next Generation of
Message Passing in Java,

An MPJ reference implementation could be
implemented as:




Java wrappers to a native MPI implementation,
Pure Java,
Principally in Java – with a few simple native methods
to optimize operations (like marshalling arrays of
primitive elements) that are difficult to do efficiently in
Java.
We are aiming at pure Java to provide an
implementation of MPJ that is maximally portable
and that hopefully requires the minimum amount
of support effort.
24 May, 2017
[email protected]
Benefits of a pure Java
implementation of MPJ




Highly portable. Assumes only a Java
development environment.
Performance: moderate. May need JNI
inserts for marshalling arrays. Network
speed limited by Java sockets.
Good for education/evaluation.
Vendors provide wrappers to native MPI
for ultimate performance?
24 May, 2017
[email protected]
Design Criteria for the MPJ
Environment

Need an infrastructure to support
groups of distributed processes:




24 May, 2017
Resource discovery,
Communications,
Handle failure,
Spawn processes on hosts.
[email protected]
Resource discovery



Technically, Jini discovery and lookup
seems an obvious choice.
Daemons register with lookup services.
A “hosts file” may still guide the search
for hosts, if preferred.
24 May, 2017
[email protected]
Communication base



Maybe, some day, Java VIA??
For now sockets are the only portable
option.
RMI surely too slow.
24 May, 2017
[email protected]
Handling “Partial Failures”

Need to overcome:







When a network connection breaks,
The host system goes down,
The JVM running the remote MPJ task halts for some other
reason (e.g., occurrence of a Java exception),
The program that initiated the MPJ job is killed.
Unexpected termination of any particular MPJ job.
Concurrent tasks associated with other MPJ jobs should be
unaffected, even if they were initiated by the same
daemon.
All processes associated with the particular job must
shut down within some (preferably short) interval of
time cleanly.
24 May, 2017
[email protected]
Handling “Partial Failures”


A useable MPJ implementation must
deal with unexpected process
termination or network failure, without
leaving orphan processes, or leaking
other resources.
Could reinvent protocols to deal with
these situations, but Jini provides a
ready-made framework (or, at least, a
set of concepts).
24 May, 2017
[email protected]
Handling failures with Jini


If any slave dies, client generates a Jini
distributed event, MPIAbort – all slaves
are notified and all processes killed.
In case of other failures (network failure,
death of client, death of controlling
daemon, …) client leases on slaves
expire in a fixed time, and processes are
killed.
24 May, 2017
[email protected]
Integration of Jini and MPI

Provides a natural Java framework for
parallel computing with the powerful
fault tolerance and dynamic
characteristics of Jini combined with
proven parallel computing
functionality and performance of MPI
24 May, 2017
[email protected]
MPJ - Implementation


In the initial reference implementation we will
use Jini technology to facilitate location of
remote MPJ daemons and to provide a
framework for the required fault-tolerance.
This choice rests on our guess that in the
medium-to-long-term Jini will be a ubiquitous
component in Java installations.

Hence using the Jini paradigms from the start
should eventually help inter-working and
compatibility between our software and
other systems.
24 May, 2017
[email protected]
Acquiring compute slaves
through Jini
24 May, 2017
[email protected]
MPJ



We envisage that a user will download a jar-file
of MPJ library classes onto machines that may
host parallel jobs, and install a daemon on those
machines – technically by registering an
activatable object with an rmid daemon.
Parallel java codes are compiled on one host.
An mpjrun program invoked on that host
transparently loads the user's class files into
JVMs created on remote hosts by the MPJ
daemons, and the parallel job starts.
24 May, 2017
[email protected]
MPJ - Implementation



In the short-to-medium-term – before Jini
software is widely installed – we might have
to provide a “lite” version of MPJ that is
unbundled from Jini.
Designing for Jini protocols should,
nevertheless, have a beneficial influence on
overall robustness and maintainability.
Use of Jini implies use of RMI for various
management functions.
24 May, 2017
[email protected]
Slave 1
Slave 2
Slave 3
Slave 4
Host
Mpj Deamon
Mpjrun myproggy –np 4
rmid
http server
24 May, 2017
[email protected]
MPJ – Implementation

Some assumptions that have a bearing on the
organization of the MPJ daemon:



stdout (and stderr) streams from all tasks in an MPJ
job are merged non-deterministically and copied to the
stdout of the process that initiates the job.
No guarantees are made about other IO operations these are system dependent.
Rudimentary support for global checkpointing and
restarting of interrupted jobs may be quite useful,
although checkpointing would not happen without
explicit invocation in the user-level code, or that
restarting would happen automatically.
24 May, 2017
[email protected]
MPJ – Implementation



The role of the MPJ daemons and their associated
infrastructure is to provide an environment consisting
of a group of processes with the user-code loaded
and running in a reliable way.
The process group is reliable in the sense that no
partial failures should be visible to higher levels of
the MPJ implementation or the user code.
We will use Jini leasing to provide fault tolerance –
clearly no software technology can guarantee the
absence of total failures, where the whole MPJ job
dies at essentially the same time.
24 May, 2017
[email protected]
MPJ - Implementation



Once a reliable cocoon of user processes has
been created through negotiation with the
daemons, we have to establish connectivity.
In the reference implementation this will be
based on Java sockets.
Recently there has been interest in
producing Java bindings to VIA - eventually
this may provide a better platform on which
to implement MPI, but for now sockets are
the only realistic, portable option.
24 May, 2017
[email protected]
MPJ – Implementation



Between the socket API and the MPJ API there will
be an intermediate “MPJ device” level – modelled
on the Abstract Device Interface (ADI) of MPICH.
Although the role is slightly different here - we do
not really anticipate a need for multiple platformspecific implementations - this still seems like a
good layer of abstraction to have in our design.
The API is actually not modelled in detail on the
MPICH device, but the level of operations is similar
(based on isend/irecv/waitany calls).
24 May, 2017
[email protected]
Layers of an MPJ
Reference Implementation
High Level MPI
Base Level MPI
Collective Operations
Process Topologies
All pt-to-pt modes
Groups
Isend, irecv, waitany,
…
MPJ Device Level
Physical PIDs
Contexts & Tags
Byte vector data
Java Socket and Thread API
Process Creation and Monitoring
Exec java MPJLoader
Serializable objects
24 May, 2017
Datatypes
All-to-all TCP Connect
Input Handler Threads
MPJ Daemon
Lookup, Leasing
(Jini)
Communicators
[email protected]
Synchronised methods,
wait, notify…
MPJ - Conclusions



On-going effort (NSF proposal + volunteer
help).
Collaboration to define exact MPJ interface
– consisting of other Java MP system
developers.
Work at the moment is based around the
development of the low-level MPJ device
and exploring the functionality of Jini.
24 May, 2017
[email protected]