Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Thoughts on a Java Reference Implementation for MPJ Mark Baker*, Bryan Carpenter *University of Portsmouth Florida State University IPDPS, Cancun, Mexico – 5th May 2000 http://www.dcs.port.ac.uk/~mab/Talks/ 24 May 2017 [email protected] Contents Introduction Some design decisions An overview of the architecture Process creation and monitoring The MPJ daemon Handling aborts and failures MPJ device Conclusions and future work 24 May, 2017 [email protected] Introduction The Message-Passing Working Group of the Java Grande Forum was formed in late 1998 as a response to the appearance of several prototype Java bindings for MPI-like libraries. An initial draft for a common API specification was distributed at Supercomputing '98. Since then the working group has met in San Francisco and Syracuse. The present API is now called MPJ. 24 May, 2017 [email protected] Introduction No complete implementation of the draft specification. mpiJava, is moving towards the “standard”. The new version (1.2) of the software supports direct communication of objects via object serialization, Version 1.3 of mpiJava will implement the new API. The mpiJava wrappers rely on the availability of platform-dependent native MPI implementation for the target computer. 24 May, 2017 [email protected] Introduction While this is a reasonable basis in many cases, the approach has some disadvantages. The 2-stage installation procedure – get and build native MPI then install and match the Java wrappers – tedious/off-putting to new users. On several occasions we saw conflicts between the JVM environment and the native MPI runtime behaviour. The situation has improved, and mpiJava now runs on various combinations of JVM and MPI implementation. This strategy simply conflicts with the ethos of Java – write-once-run-anywhere software is the order of the day. 24 May, 2017 [email protected] MPJ – the Next Generation of Message Passing in Java, An MPJ reference implementation could be implemented as: Java wrappers to a native MPI implementation, Pure Java, Principally in Java – with a few simple native methods to optimize operations (like marshalling arrays of primitive elements) that are difficult to do efficiently in Java. We are aiming at pure Java to provide an implementation of MPJ that is maximally portable and that hopefully requires the minimum amount of support effort. 24 May, 2017 [email protected] Benefits of a pure Java implementation of MPJ Highly portable. Assumes only a Java development environment. Performance: moderate. May need JNI inserts for marshalling arrays. Network speed limited by Java sockets. Good for education/evaluation. Vendors provide wrappers to native MPI for ultimate performance? 24 May, 2017 [email protected] Design Criteria for the MPJ Environment Need an infrastructure to support groups of distributed processes: 24 May, 2017 Resource discovery, Communications, Handle failure, Spawn processes on hosts. [email protected] Resource discovery Technically, Jini discovery and lookup seems an obvious choice. Daemons register with lookup services. A “hosts file” may still guide the search for hosts, if preferred. 24 May, 2017 [email protected] Communication base Maybe, some day, Java VIA?? For now sockets are the only portable option. RMI surely too slow. 24 May, 2017 [email protected] Handling “Partial Failures” Need to overcome: When a network connection breaks, The host system goes down, The JVM running the remote MPJ task halts for some other reason (e.g., occurrence of a Java exception), The program that initiated the MPJ job is killed. Unexpected termination of any particular MPJ job. Concurrent tasks associated with other MPJ jobs should be unaffected, even if they were initiated by the same daemon. All processes associated with the particular job must shut down within some (preferably short) interval of time cleanly. 24 May, 2017 [email protected] Handling “Partial Failures” A useable MPJ implementation must deal with unexpected process termination or network failure, without leaving orphan processes, or leaking other resources. Could reinvent protocols to deal with these situations, but Jini provides a ready-made framework (or, at least, a set of concepts). 24 May, 2017 [email protected] Handling failures with Jini If any slave dies, client generates a Jini distributed event, MPIAbort – all slaves are notified and all processes killed. In case of other failures (network failure, death of client, death of controlling daemon, …) client leases on slaves expire in a fixed time, and processes are killed. 24 May, 2017 [email protected] Integration of Jini and MPI Provides a natural Java framework for parallel computing with the powerful fault tolerance and dynamic characteristics of Jini combined with proven parallel computing functionality and performance of MPI 24 May, 2017 [email protected] MPJ - Implementation In the initial reference implementation we will use Jini technology to facilitate location of remote MPJ daemons and to provide a framework for the required fault-tolerance. This choice rests on our guess that in the medium-to-long-term Jini will be a ubiquitous component in Java installations. Hence using the Jini paradigms from the start should eventually help inter-working and compatibility between our software and other systems. 24 May, 2017 [email protected] Acquiring compute slaves through Jini 24 May, 2017 [email protected] MPJ We envisage that a user will download a jar-file of MPJ library classes onto machines that may host parallel jobs, and install a daemon on those machines – technically by registering an activatable object with an rmid daemon. Parallel java codes are compiled on one host. An mpjrun program invoked on that host transparently loads the user's class files into JVMs created on remote hosts by the MPJ daemons, and the parallel job starts. 24 May, 2017 [email protected] MPJ - Implementation In the short-to-medium-term – before Jini software is widely installed – we might have to provide a “lite” version of MPJ that is unbundled from Jini. Designing for Jini protocols should, nevertheless, have a beneficial influence on overall robustness and maintainability. Use of Jini implies use of RMI for various management functions. 24 May, 2017 [email protected] Slave 1 Slave 2 Slave 3 Slave 4 Host Mpj Deamon Mpjrun myproggy –np 4 rmid http server 24 May, 2017 [email protected] MPJ – Implementation Some assumptions that have a bearing on the organization of the MPJ daemon: stdout (and stderr) streams from all tasks in an MPJ job are merged non-deterministically and copied to the stdout of the process that initiates the job. No guarantees are made about other IO operations these are system dependent. Rudimentary support for global checkpointing and restarting of interrupted jobs may be quite useful, although checkpointing would not happen without explicit invocation in the user-level code, or that restarting would happen automatically. 24 May, 2017 [email protected] MPJ – Implementation The role of the MPJ daemons and their associated infrastructure is to provide an environment consisting of a group of processes with the user-code loaded and running in a reliable way. The process group is reliable in the sense that no partial failures should be visible to higher levels of the MPJ implementation or the user code. We will use Jini leasing to provide fault tolerance – clearly no software technology can guarantee the absence of total failures, where the whole MPJ job dies at essentially the same time. 24 May, 2017 [email protected] MPJ - Implementation Once a reliable cocoon of user processes has been created through negotiation with the daemons, we have to establish connectivity. In the reference implementation this will be based on Java sockets. Recently there has been interest in producing Java bindings to VIA - eventually this may provide a better platform on which to implement MPI, but for now sockets are the only realistic, portable option. 24 May, 2017 [email protected] MPJ – Implementation Between the socket API and the MPJ API there will be an intermediate “MPJ device” level – modelled on the Abstract Device Interface (ADI) of MPICH. Although the role is slightly different here - we do not really anticipate a need for multiple platformspecific implementations - this still seems like a good layer of abstraction to have in our design. The API is actually not modelled in detail on the MPICH device, but the level of operations is similar (based on isend/irecv/waitany calls). 24 May, 2017 [email protected] Layers of an MPJ Reference Implementation High Level MPI Base Level MPI Collective Operations Process Topologies All pt-to-pt modes Groups Isend, irecv, waitany, … MPJ Device Level Physical PIDs Contexts & Tags Byte vector data Java Socket and Thread API Process Creation and Monitoring Exec java MPJLoader Serializable objects 24 May, 2017 Datatypes All-to-all TCP Connect Input Handler Threads MPJ Daemon Lookup, Leasing (Jini) Communicators [email protected] Synchronised methods, wait, notify… MPJ - Conclusions On-going effort (NSF proposal + volunteer help). Collaboration to define exact MPJ interface – consisting of other Java MP system developers. Work at the moment is based around the development of the low-level MPJ device and exploring the functionality of Jini. 24 May, 2017 [email protected]