Download Message-passing programming in Java

Document related concepts
no text concepts found
Transcript
Parallel computing and
message-passing in Java
Bryan Carpenter
NPAC at Syracuse University
Syracuse, NY 13244
[email protected]
Goals of this lecture



Survey approaches to parallel
computing in Java.
Describe a Java binding of MPI
developed in the HPJava project at
Syracuse.
Discuss ongoing activities related to
message-passing in the Java Grande
Forum—MPJ.
Contents of Lecture


Survey of parallel computing in Java
Overview of mpiJava




API and Implementation
Benchmarks and demos
Object Serialization in mpiJava
Message-passing activities in Java Grande

Thoughts on a Java Reference Implementation for
MPJ
Survey of Parallel Computing
in Java
Sung Hoon Ko
NPAC at Syracuse University
Syracuse, NY 13244
[email protected]
Java for High-Performance Computing


Java is potentially an excellent platform for
developing large-scale science and engineering
applications
Java has advantages.





Java is descendant of C++.
Java omits various features of C and C++ that are
considered difficult - e.g pointer.
Java comes with built-in multithreading.
Java is portable.
Java has advantages in visualisation and user interfaces .
The Java Grande Forum



Java has some problems that hinder its use for Grande
applications.
Java Grande Forum created to make Java a better platform
for Grande applications.
Currently two working groups are exist.

Numeric Working Group


complex and floating-point arithmetic, mulitidimensional arrays,
operator overloading, etc.
Concurrency/Applications Working Group

performance of RMI and object serialization, benchmarking, computing
portals, etc.
Approaches to Parallelism in Java



Automatic parallelization of sequential
code.
JVM for SMP can be schedule the
threads of a multi-threaded Java code.
Language extensions or directive akin
to HPF or provision of libraries
Message Passing with Java

Java sockets


Java RMI



unattractive to scientific parallel programming
It is restrictive and overhead is high.
(un)marshaling of data is costly than socket.
Message passing libraries in Java


Java as wrapper for existing libraries
Use only pure Java libraries
Java Based Frameworks

Use Java as wrapper for existing frameworks.


Use pure Java libraries.



(mpiJava, Java/DSM, JavaPVM)
(MPJ, DOGMA, JPVM, JavaNOW)
Extend Java language with new keywords.

Use preprocessor or own compiler to create

Java(byte) code. (HPJava, Manta, JavaParty, Titanium)
Web oriented and use Java applets to excute parallel task.
(WebFlow, IceT, Javelin)
Use Java as wrapper for
existing frameworks. (I)

JavaMPI : U. of Westminster




Java wrapper to MPI
Wrappers are automatically generated from
the C MPI header using Java-to-C interface
generator(JCI).
Close to C binding, Not Object-oriented.
JavaPVM(jPVM) : Georgia Tech.

Java wrapper to PVM
Use Java as wrapper for
existing frameworks. (II)

Java/DSM : Rice U.





Heterogeneous computing system.
Implements a JVM on top of a TreadMarks
Distributed Shared Memory(DSM) system.
One JVM on each machine. All objects are
allocated in the shared memory region.
Provides Transparency : Java/DSM combination
hides the hardware differences from the
programmer.
Since communication is handled by the underlying
DSM, no explicit communication is necessary.
Use pure Java libraries(I)

JPVM : U. of Virginia




A pure Java implementation of PVM.
Based on communication over TCP sockets.
Performance is very poor compared to JavaPVM.
jmpi : Baskent U.


A pure Java implementation of MPI built on top of
JPVM.
Due to additional wrapper layer to JPVM routines,
its performance is poor compared to JPVM.
(JavaPVM < JPVM < jmpi)
Use pure Java libraries(II)

MPIJ : Brigham Young U.

A pure Java based subset of MPI developed as
part of the Distributed Object Group Metacomputing Architecture(DOGMA)
Hard to use.
JMPI : MPI Software Technology




Develop a commercial message-passing
framework and parallel support environment for
Java.
Targets to build a pure Java version of MPI-2
standard specialized for commercial applications.
Use pure Java libraries(III)

JavaNOW : Illinois Institute Tech.



Shared memory based system and experimental
message passing framework.
Creates a virtual parallel machine like PVM.
Provides




implicit multi-threading
implicit synchronization
distributed associative shared memory similar to Linda.
Currently available as standalone software and
must be used with a remote (or secure) shell tool
in order to run on a network of workstations.
Extend Java Language(I)



Use pre-processor to create Java code.
Own compiler to create Java Byte code or
executable code that loose portability of Java.
Manta : Vrije University



Compiler-based high-performance Java system.
Uses native compiler for aggressive optimisations.
Has optimised RMI protocol(Manta RMI).
Extend Java Language(II)

Titanium : UC Berkeley



Java based language for high-performance parallel
scientific computing.
Titanium compiler translates Titanium into C.
Extends Java with additional features like




immutable classes which behave like existing Java
primitive types or C structs.
multidimensional arrays
an explicitly parallel SPMD model of computation with a
global address space
a mechanism for programmer to control memory
management.
Extend Java Language(III)

JavaParty : University of Karlsruhe



Provides a mechanism for parallel
programming on distributed memory
machines.
Compiler generates the appropriate Java
code plus RMI hooks.
The remote keywords is used to identify
which objects can be called remotely.
Web oriented

IceT : Emory University




Enables users to share JVMs across a network.
A user can upload a class to another virtual machine using a
PVM-like interface.
By explicitly calling send and receive statements, work can
be distributed among multiple JVMs.
Javelin : UC Santa Barbara


Internet-based parallel computing using Java by running
Java applets in web browsers.
Communication latencies are high since web browsers use
RMIs over TCP/IP, typically over slow Ethernets.
Object Serialization and RMI

Object Serialization



Provides a program the ability to read or write a whole object to
and from a raw byte stream.
An essential feature needed by RMI implementation when
method arguments are passed by copy.
RMI



Provides easy access to objects existing on remote virtual
machines.
Designed for Client-Server applications over unstable and slow
networks.
Fast remote method invocations with low latency and high
bandwidth are required for high performance computing.
Performance Problems of
Object Serialization

Does not handle float and double types efficiently.



Costly encoding of type information


The type cast which is implemented in the JNI, requires
various time consuming operations for check-pointing and
state recovery.
float arrays invokes the above mentioned JNI routine for
every single array element.
For every type of serialized object, all fields of the type are
described verbosely.
Object creation takes too long.

Object output and input should be overlapped to reduce
latency.
Efficient Object Serialization(I)

UKA-serialization (as part of JavaParty)

Slim Encoding type information



Approach : When objects are being communicated, it can
be assumed that all JVMs that collaborate on a parallel
applications use the same file system(NSF).
It is much shorter to textually send the name of the class
including package prefix.
Uses explicit (un)marshaling instead of reflection
(by writeObject)

For regular users of object serialization, programmers do
not implement (un)marshaling, instead they rely on
Java’s reflection.
Efficient Object Serialization(II)

UKA-serialization (as part of JavaParty)(cont.)

Better buffer handling and less copying to achieve better
performance.

JDK External Buffering problems



UKA-serialization handles the buffering Internally and Public.


On the recipient side, JDK-serialization uses buffered stream
implementation that does not know byte representation of objects.
User can not directly write into External Buffer, instead use special write
routines.
By making the buffer Public, explicit marshaling routines can write their
data immediately into the buffer.
With Manta: The serialization code is generated by the compiler

This makes it possible to avoid the overhead of dynamic inspection
of the object structure.
mpiJava: A Java Interface to
MPI
Mark Baker, Bryan Carpenter, Geoffrey Fox,
Guansong Zhang.
www.npac.syr.edu/projects/pcrc/HPJava/mpiJava.html
The mpiJava wrapper



Implements a Java API for MPI
suggested in late ‘97.
Builds on work on Java wrappers for
MPI started at NPAC about a year earlier.
People: Bryan Carpenter, Yuh-Jye
Chang, Xinying Li, Sung Hoon Ko,
Guansong Zhang, Mark Baker, Sang Lim.
mpiJava features.





Fully featured Java interface to MPI 1.1
Object-oriented API based on MPI 2
standard C++ interface
Initial implementation through JNI to
native MPI
Comprehensive test suite translated
from IBM MPI suite
Available for Solaris, Windows NT and
other platforms
Class hierarchy
MPI
Group
Cartcomm
Intracomm
Comm
Package mpi
Graphcomm
Intercomm
Datatype
Status
Request
Prequest
Minimal mpiJava program
import mpi.*
class Hello {
static public void main(String[] args) {
MPI.Init(args) ;
int myrank = MPI.COMM_WORLD.Rank() ;
if(myrank == 0) {
char[] message = “Hello, there”.toCharArray() ;
MPI.COMM_WORLD.Send(message, 0, message.length, MPI.CHAR, 1, 99) ;
}
else {
char[] message = new char [20] ;
MPI.COMM_WORLD.Recv(message, 0, 20, MPI.CHAR, 0, 99) ;
System.out.println(“received:” + new String(message) + “:”) ;
}
MPI.Finalize() ;
}
}
MPI datatypes

Send and receive members of Comm:
void send(Object buf, int offset, int count,
Datatype type, int dst, int tag) ;
Status recv(Object buf, int offset, int count,
Datatype type, int src, int tag) ;

buf must be an array. offset is the
element where message starts. Datatype
class describes type of elements.
Basic Datatypes
MPI Datatype
MPI.BYTE
MPI.CHAR
MPI.SHORT
MPI.BOOLEAN
MPI.INT
MPI.LONG
MPI.FLOAT
MPI.DOUBLE
MPI.OBJECT
Java Datatype
byte
char
short
boolean
int
long
float
double
object
mpiJava implementation
issues


mpiJava is currently implemented as
Java interface to an underlying MPI
implementation - such as MPICH or
some other native MPI implementation.
The interface between mpiJava and
the underlying MPI implementation is
via the Java Native Interface (JNI).
mpiJava - Software Layers
MPIprog.java
Import mpi.*;
JNI C Interface
Native Library (MPI)
mpiJava implementation
issues




Interfacing Java to MPI not always trivial,
e.g., see low-level conflicts between the
Java runtime and interrupts in MPI.
Situation improving as JDK matures 1.2
Now reliable on Solaris MPI (SunHPC,
MPICH), shared memory, NT (WMPI).
Linux - Blackdown JDK 1.2 beta just out
and seems OK - other ports in progress.
mpiJava - Test Machines
Processor
Dual PII 200 MHz
Dual UltraSparc
200 MHz
450 MHz PII &
100 MHz P5
Memory
128 MB
OS
Interconnect
NT 4
10 Mbps
(SP3)
Ethernet
256 MB Solaris
10 Mbps
2.5
Ethernet
256 MB & Linux
100 Mbps
64 MB
2.X
Ethernet
mpiJava performance
Wsock
WMPI-C
WMPI-J
SM 144.8 μs 67.2μs 161.4μs
DM 244.9 μs 623.3μs 689.7μs
MPICH-C
MPICH-J
Linux-C
Linux-J
148.7μs
679.1μs
374.6μs
961.2μs
- μs
- μs
- μs
- μs
mpiJava performance
1. Shared memory mode
mpiJava performance
2. Distributed memory
mpiJava demos
1. CFD: inviscid flow
mpiJava demos
2. Q-state Potts model
Object Serialization in mpiJava
Bryan Carpenter, Geoffrey Fox, Sung-Hoon Ko,
and Sang Lim
www.npac.syr.edu/projects/pcrc/HPJava/mpiJava.html
Some issues in design of a
Java API for MPI



Class hierarchy. MPI is already
object-based. “Standard” class
hierarchy exists for C++.
Detailed argument lists for
methods. Properties of Java language
imply various superficial changes from
C/C++.
Mechanisms for representing
message buffers.
Representing Message Buffers
Two natural options:
 Follow the MPI standard route: derived
datatypes describe buffers consisting of
mixed primitive fields scattered in local
memory.
 Follow the Java standard route: automatic
marshalling of complex structures through
object serialization.
Overview of this part of
lecture





Discuss incorporation of derived datatypes
in the Java API, and limitations.
Adding object serialization at the API level.
Describe implementation using JDK
serialization.
Benchmarks for naïve implementation.
Optimizing serialization.
Basic Datatypes
MPI datatype
Java datatype
MPI.BYTE
MPI.CHAR
MPI.SHORT
MPI.BOOLEAN
MPI.INT
MPI.LONG
MPI.FLOAT
MPI.DOUBLE
MPI.OBJECT
byte
char
short
boolean
int
long
float
double
Object
Derived datatypes
MPI derived datatypes have two roles:
 Non-contiguous data can be transmitted
in one message.
 MPI_TYPE_STRUCT allows mixed
primitive types in one message.
Java binding doesn’t support second role.
All data come from a homogeneous
array of elements (no MPI_Address).
Restricted model
A derived datatype consists of
 A base type. One of the 9 basic types.
 A displacement sequence. A
relocatable pattern of integer
displacements in the buffer array:
{disp , disp , . . . , disp }
0
1
n-1
Limitations


Can’t mix primitive types or fields from
different objects.
Displacements only operate within 1d
arrays. Can’t use
MPI_TYPE_VECTOR to describe
sections of multidimensional arrays.
Object datatypes



If type argument is MPI.OBJECT, buf
should be an array of objects.
Allows to send fields of mixed primitive
types, and fields from different objects,
in one message.
Allows to send multidimensional arrays,
because they are arrays of arrays (and
arrays are effectively objects).
Automatic serialization



Send buf should be an array of objects
implementing Serializable.
Receive buf should be an array of
compatible reference types (may be
null).
Java serialization paradigm applied:

Output objects (and objects referenced
through them) converted to a byte stream.
Object graph reconstructed at the receiving
end.
Implementation issues for
Object datatypes



Initial implementation in mpiJava used
ObjectOutputStream and
ObjectInputStream classes from JDK.
Data serialized and sent as a byte vector,
using MPI.
Length of byte data not known in
advance. Encoded in a separate header
so space can be allocated dynamically in
receiver.
Modifications to mpiJava




All mpiJava communications, including
non-blocking modes and collective
operations, now allow objects as base
types.
Header + data decomposition
complicates, eg, wait and test family.
Derived datatypes complicated.
Collective comms involve two phases if
base type is OBJECT.
Benchmarking mpiJava with
naive serialization


Assume in “Grande” applications, critical
case is arrays of primitive element.
Consider N x N arrays:
float [] [] buf = new float [N] [N] ;
MPI.COMM_WORLD.send(buf, 0, N, MPI.OBJECT,
dst, tag) ;
float [] [] buf = new float [N] [] ;
MPI.COMM_WORLD.recv(buf, 0, N, MPI.OBJECT,
src, tag) ;
Platform





Cluster of 2-processor, 200 Mhz
Ultrasparc nodes
SunATM-155/MMF network
Sun MPI 3.0
“non-shared memory” = inter-node
comms
“shared memory” = intra-node comms
Non-shared memory: byte
Non-shared memory: float
Shared memory: byte
Shared memory: float
Parameters in timing model
(microseconds)
t
byte
ser
= 0.043
byte
t unser= 0.027
byte
tcom = 0.062
byte
t com = 0.008
t
float
ser
= 2.1
float
tunser = 1.4
float
t com = 0.25
float
(non-shared)
tcom = 0.038 (shared)
Benchmark lessons


Cost of serializing and unserializing an
individual float one to two orders of
magnitude greater than communication!
Serializing subarrays also expensive:
vec
tser = 100
vec
tunser = 53
Improving serialization


Sources of ObjectOutputStream,
ObjectInputStream are available,
and format of serialized stream is
documented.
By overriding performance-critical
methods in classes, and modifying
critical aspects of the stream format,
can hope to solve immediate problems.
Eliminating overheads of
element serialization




Customized ObjectOutputStream replaces
primitive arrays with short ArrayProxy
object. Separate Vector holding the Java
arrays is produced.
“Data-less” byte stream sent as header.
New ObjectInputStream yields Vector of
allocated arrays, not writing elements.
Elements then sent in one comm using
MPI_TYPE_STRUCT from vector info.
Improved protocol
Customized output stream
class



In experimental implementation, use
inheritance from standard stream class,
ObjectOutputStream.
Class ArrayOutputStream extends
ObjectOutputStream, and defines method
replaceObject.
This method tests if argument is a primitive
array. If it is, reference to the array is stored
in the dataVector, and a small proxy object is
placed in the output stream.
Customized input stream class


Similarly, class ArrayInputStream extends
ObjectInputStream, and defines method
resolveObject.
This method tests if argument is an array
proxy. If it is, a primitive array of the
appropriate size and type is created and
stored in the dataVector.
Non-shared memory: float
(optimized in red)
Non-shared memory: byte
(optimized in red)
Shared memory: float
(optimized in red)
Shared memory: byte
(optimized in red)
Comments



Relatively easy to get dramatic
improvements.
Have only truly optimized one
dimensional arrays embedded in stream.
Later work looked at direct
optimizations for rectangular multidimensional arrays---replace wholesale
in stream.
Conclusions on object
serialization




Derived datatypes workable for Java, but
slightly limited.
Object basic types attractive on grounds
of simplicity and generality.
Naïve implementation too slow for bulk
data transfer.
Optimizations should bring asymptotic
performance in line with C/Fortran MPI.
Message-passing in Java
Grande.
http://www.javagrande.org
Projects related to MPI and
Java





mpiJava (Syracuse)
JavaMPI (Getov et al, Westminster)
JMPI (MPI Software Technology)
MPIJ (Judd et al, Brigham Young)
jmpi (Dincer et al)
1. DOGMA MPIJ




Completely Java-based
implementation of a large subset of MPI.
Part of Distributed Object Group
Metacomputing Architecture.
Uses native marshalling of primitive
Java types for performance.
Judd, Clement and Snell, 1998.
2. Automatic wrapper
generation



JCI Java-to-C interface generator takes
input C header and generates stub
functions for JNI Java interface.
JavaMPI bindings generated in this
way resemble the C interface to MPI.
Getov and Mintchev, 1997.
3. JMPI™ environment


Commercial message-passing
environment for Java announced by
MPI Software Technology.
Crawford, Dandass and Skjellum, 1997
4. jmpi instrumented MPI




100% Java implementation of an MPI
subset.
Layered on JPVM.
Instrumented for performance analysis
and visualization.
Dincer and Kadriy, 1998.
Standardization?



Currently all implementations of MPI for
Java have different APIs.
An “official” Java binding for MPI
(complementing Fortran, C, C++
bindings) would help.
Position paper and draft API:
Carpenter, Getov, Judd, Skjellum and
Fox, 1998.
Java Grande Forum



Level of interest in message-passing for Java
healthy, but not enough to expect MPI forum
to reconvene.
More promising to work within the Java
Grande Forum. Message-Passing Working
Group formed (as a subset of the existing
Concurrency and Applications working group).
To avoid conflicts with MPIF, Java effort
renamed to MPJ.
MPJ



Group of enthusiasts, informally chaired
by Vladimir Getov.
Meetings in last year in San Francisco
(Java ‘99), Syracuse, and Portland (SC
‘99).
Regular attendance by members of
SunHPC group, amongst others.
Thoughts on a Java Reference
Implementation for MPJ
Mark Baker, Bryan Carpenter
Benefits of a pure Java
implementation of MPJ



Highly portable. Assumes only a Java
development environment.
Performance: moderate. May need JNI
inserts for marshalling arrays. Network
speed limited by Java sockets.
Good for education/evaluation. Vendors
provide wrappers to native MPI for
ultimate performance?
Resource discovery


Technically, Jini discovery and lookup
seems an obvious choice. Daemons
register with lookup services.
A “hosts file” may still guide the search
for hosts, if preferred.
Communication base

Maybe, some day, Java VIA?? For now
sockets are the only portable option.
RMI surely too slow.
Handling “Partial Failures”


A useable MPI implementation must
deal with unexpected process
termination or network failure, without
leaving orphan processes, or leaking
other resources.
Could reinvent protocols to deal with
these situations, but Jini provides a
ready-made framework (or, at least, a
set of concepts).
Acquiring compute slaves
through Jini
Handling failures with Jini


If any slave dies, client generates a Jini
distributed event, MPIAbort. All slaves
are notified and all processes killed.
In case of other failures (network failure,
death of client, death of controlling
daemon, …) client leases on slaves
expire in a fixed time, and processes
are killed.
Higher layers
Integration of Jini and MPI
Geoffrey C. Fox
NPAC at Syracuse University
Syracuse, NY 13244
[email protected]
Integration of Jini and
MPI

Provide a natural Java framework for
parallel computing with the powerful
fault tolerance and dynamic
characteristics of Jini combined with
proven parallel computing functionality
and performance of MPI
JiniMPI Architecture
RMI
is MPI Transport Layer
PC is Parallel Computing
SPMD
Program
SPMD
Program
SPMD
Program
SPMD
Program
Jini PC
Embryo
Jini PC
Embryo
Jini PC
Embryo
Jini PC
Embryo
Jini Lookup
Service
PC Proxy
Middle Tier
PC Proxy
PC Control
and Services
PC Proxy
PC Proxy
Remarks on JiniMPI I

This architecture is more general than that needed to support MPI like
parallel computing



The diagram only shows server (bottom) and service (top) layers.
There is of course a client layer which communicates directly with
“Parallel Computing (PC) Control and Services module”
We assume that each workstation has a “Jini client” called here a “Jini
Parallel Computing (PC) Embryo” which registers the availability of
that workstation to run either particular or generic applications


It includes ideas present in systems like Condor and Javelin
The Jini embryo can represent the machine (I.e. ability to run general
applications) or particular software
The Gateway or “Parallel Computing (PC) Control and Services module”
queries Jini lookup server to find appropriate service computers to run
a particular MPI job

It could of course use this mechanism “just” to be able to run a single job
or to set up a farm of independent workers
Remarks on JiniMPI II



The standard Jini mechanism is applied for each chosen embryo. This
effectively establishes an RMI link from Gateway to (SPMD) node which
corresponds to creating a Java proxy (corresponding to RMI stub) for
the node program which can be any language (Java, Fortran, C++ etc.)
This Gateway--Embryo exchange should also supply to the Gateway
any needed data (such as specification of needed parameters and how
to input them) for user client layer
This strategy separates control and data transfer




It supports Jini (registration, lookup and invocation) and advanced
services such as load balancing and fault tolerance on control layer
and MPI style data messages on fast transport layer
The Jini embryo is only used to initiate process. It is not involved in the
actual “execution” phase
One could build a JavaSpace at the Control layer as the basis of a
powerful management environment

This is very different from using Linda (JavaSpaces) in execution layer as
in Control layer one represents each executing node program by a proxy
and normal performance problems with Linda are irrelevant