Download Communicating in Java

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Communicating Threads and Processes in
Java -an approach to reliable parallel programming
Manuel I. Capel-Tuñón
Overview
–
Although not initially designed
as a High Performance
Parallel Programming
language, Java is an attractive
candidate for it.
–
Would like to review a
selection of programming
models and systems to bring
University of Granada, Spain.
[email protected]
HP Computing to Java.
Motivation
Web-based
global computing:
–High demand of geographically distributed resources
–Useful for “strategic” applications: financial modeling,
computational genetics, weather forecasting, etc.
–Now taking advantage of spare cycles in computers
across Internet is technologically feasible
Programming
features will also make Java the language
of choice for numerical computing
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
The Java Parallel Programming Model

Fits better under control parallelism

Threads are created and managed explicitly as objects

Weak consistency memory model (memory model
easily mapped onto shared-memory systems)

Creation methods for groups of threads not provided

Remote Method Invocation, for distributed systems
Java’s software architecture
Local disk
X.class
Y.class
Native
javac
libraries
JVM
source
source
thread
thread
worki
ng
files
Network
stack memo0101
ry 1011
assign
thread
worki
1000
ng 0101
1101
stack use
1011
memo
ry
assign
1000
0101
use 1011
1101
1000
1101
A.class
B.class
services GUI
Operating system
hardware
files
Main memory
cache
central
stack
Garbage collector
Heap
Models for running Java byte-code

Hardware

Native code compiler

Just-In-Time (JIT) compiler

java interpreter
Java’s software architecture
Local disk
Native
X.class
Y.class
libraries
javac
verifier
source
JNI
files
Class
Main memory
loader
cache
Network
JIT
native executable
A.class
B.class
services GUI
Operating system
hardware
central
stack
Garbage collector
Heap
Java’s memory model
JVM
main
memory
01010101
1011 1011
10001000
thread
1101 1101
working
01010101
stack
memory store 1011 1011
thread
assign 0101
10001000
working
load 1101 1101
1011
stack
memory
thread
use
assign 01011000
working 1101
central
stack
1011
memory
results stack
assignuse 01011000
1101
1011
use
1000
operands
1101
boundary
 weak
consistency:shared
variables updates are only
made visible to other
threads in synchronised
code blocks
 uses
a computation model
based on a (global) stack
 Efficient
Java memory model
code interpretation
but not execution on
register based processors
Taking advantage of registers
int A, B, C, j;
A = 4;
B = 8;
for (j=0; j< 10; j++)
{
C= A + B;
... // A and B are modified
in the body of the loop
}
A register
based
architecture will load the
variables once for the
entire loop
JVM pushes A, B on the
stack each time C is
computed!
A byte-code to native code
compiler will eliminate the
execution overhead
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
Multithread programming in Java

Java contains a series of
constructs and specific classes
for concurrent programming:
– java.lang.Thread
– java.lang.Object:
» wait()
» notify()
» notifyAll()
– synchronized, volatile

Object
Runnable
Multithreading is a basic
feature for the implementation
of good applications in Java
Thread
Thread states and scheduling
new

runnable
I/O ends
end sleep
notify( )
notifyAll( )
join( ) ends
suspend( )
yield( )
running
suspend( )
suspended
I/O begins,
sleep( ),
wait( ),
join( )
blocked
Suspended
-blocked
There is a set of methods
which produce a change in
the state of a given thread:
–
–
–
–
–
–
–
–
–
–
–
new
wait()
notify()
notifyAll()
start()
yield()
suspend()
resume()
sleep(time)
join()
stop()
The ThreadGroup Class
public void

Barrier(Semaphore barrier){
to create or start all thread
barrier.counter -=1;
members simultaneously
if barrier.counter >0
barrier.wait();

else {
barrier.notifyAll();
}
}
Collective communication must
also be explicitly programmed
barrier.counter=
num_threads;
Java does not provide methods

by adding mutual exclusion and
auxiliary variables, a low-level
programming style results!
Synchronization between
asynchronous threads
Threads
Synchronized
method
ready queue
Service methods
public synchronized
void deposit(double v){
while(count == slots)
try{wait();} catch(...){}
buffer[pIn]= v;
... count++;
if (count == 1) notify();
}
deposit(...)
notify
fetch()
releases
object’s lock
executes
synchronized
block
synchronized methods give a
simple way of exclusive access and
avoid race hazards between threads
Java
threads synchronization is
loosely based on monitor construct
with signal-and-continue semantics
Synchronization
in the access to
shared variables is problematic since
the semantics of notify() is error prone
anonymous condition
queue
acquires
object’s lock
The
public synchronized
double fetch(){
while(count == 0)
try{wait();} catch(...){}
buffer[pIn]= v;
... count--;
if (count==slots-1) notify();
}
The
logic within the monitor methods
involved in wait-notify pairs has to be
tightly coupled
Monitors,
as passive entities, cannot
prevent their methods being called
Producer-consumer data sharing
class Barber_shop{
.... //monitor variables
public synchronized get_haircut() //called by customers{
while(barber==0) wait();
barber--;chair++;notify();
...
}
public get_next_customer() //called by the barber{
barber++; notify();
while (chair == 0)wait();
chair--;
}
...
} // end classs
Semantic of notifications
I'm notifying:
next client can
be served !
Correct producer-consumer monitor
Monitor Barber_shop{
.... //monitor variables
procedure get_haircut; //called by customers{
if(barber==0) available.wait;
barber--;
chair++;
if (NotOcupied.queue)NotOcupied.signal
}
procedure get_next_customer; //called by the barber{
barber++;
if (available.queue) available.signal;
if (chair==0)NotOcupied.wait();
chair--;
}
} // end Monitor
Semantic of signals
I'm signaling next
awaiting client !
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
Data Parallelism in Java

Simultaneous operations on disjoint partitions of
data by multiple processors

Data give the parallel dimension of programs in
this model
Data Parallelism in Java
a11 a12 a13 ... a1n
b11 b12 b13 ... b1k
c11 c12 c13 ... c1n
a21 a22 a23 ... a2n
b21 b22 b23 ... b2k
c21 c22 c23 ... c2n
...
...
...
am1 am2 am3... amn
bm1 bm2 bm3... bmk
ck1 ck2 ck3... ckn
Data dependency
Data to Processor
mapping
Cluster of processors
Data Parallelism in Java

Homogeneous parallelism, lightweight processes,
regularly spawned, ordered event sequences

Idiomatic features:
– Producer-consumer operation for synchronization and
data communication between threads
– Creating and starting multiple threads simultaneously
– Collective communication/synchronization between sets
of threads

Java makes expressing data parallelism awkward
SPMD Java Library Interface
TSP
Köln
D7
Koblenz
D2
D1
D3
D6
Dortmund
 Localization
D4
Kasel
D5
Frankfurt
Global work queue
BS
of shared data-structures
as objects of a SPMD-Java library
 Now
a explicit communication between
processes is needed
Best solution
 Communication
found
links among threads
will be encapsulated in shared objects
 Elements
of dynamic objects may be
migrated during program execution
 Thread
groups that share an object
define their connection topology
The proposed solution

Master
Assign
subproblem
BS

Update
bs
Load
bs
bs
bs
bs

slave1
slave2
slave3

Distributed-BBapplication
construction
for the TSP
Processor limits
bs Best solution DOF
Work queue DOF
Specification of a logical topology for a
given distribution of global data, so that
access from remote servers to locally
assigned data is made easier
Methodological software construction
based on distributed active objects
(DOFs) which encapsulate global data
structures and topologies
Processes and DOFs are distributed to
the servers by the programmer
according to each parallel distributed
application
The approach is aimed at hiding low
level communication, global data
distribution transparen-cy and access
locality to any object in the program
The programming environment

Network communications between
the APs are hidden behind the
interface of a class of objects.

A class of active objects -called
DOFs- establish communication
links between them according to a
logical connection topology
Distributed
Object
Fragment
objectProxy
Input
Handler
Application
interface
externalComm(...)
send(...)
Output
Handler
Application
process

Virtual methods, such as send(..),
externalComm(..) and others,
must be explicitly programmed.

proxies encapsulate data and
provide communication facilities to
the application processes.
Run time + JVM
The socket library
Transport layer
Communication layer
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
A step ahead: the “Grid” computing
Here
Spawn This
Linux
SGI
NT
Solaris
Cluster
Cluster
Collect spare cycles, thus
computational power will be enhanced!
Idle processor
machine
Data repository
Wide-scale HP programming
paradigm
 Concurrent
Processes Processes
User
Data
0101
1011
1000
0101
1011
1000
Data
User
Computational
Resources
Computational
Resources
Data Processes
0101
1011
1000
User
Computational
Resources
Computing
–merging and splitting of
multiple virtual machines
–platform and performance
portability, safety, reusability
–multilanguage support to
include Java, C, C++, and
Fortran.
–message passing supported
paradigm for parallel and
distributed computing
Remote Method Invocation
Server program
Client program
results
Remote method
invocation
Remote
procedure
 RMI
supports polymorphism
at object’s method invocation
call
 Java
passes objects by
reference
Sun’s RMI protocol
+
serialisation
 references
Parameters
unpacked
as parameters of
an RMI call have to be
passed in network-wide
representation
Parameters
packed
results
Transport layer

message
The Internet
Current RMI implementation
rules out transparent remote
invocation of methods
Programming with Java’s RMI
The Interface
The Server side
public interface Hello extends Remote{
public class HelloImpl extends UnicastRemoteObject
public String sayHello() throws java.rmi.RemoteException;
implements Hello{ //previously declared interface
}
The Client side
public HelloImpl() throws RemoteException{
super();
}
public String sayHello() throws RemoteException{
public static void main(String args [ ]) {
returns ''Hello World!'';
System.setSecurityManager(new RMISecurityManager( ));
}
try{
public static void main(String args[]){
Hello
h = (Hello) Naming.lookup(''rmi://ockham.ugr.es/hello''); try{
HelloImpl h = HelloImpl();
String message = h.sayHello( );
Naming.rebind(''hello'', h)
System.out.println(''HelloClient: '' + message);
}
}
catch(RemoteException re){
catch(RemoteException re){
...
...
}
}
}
}
}
Implementation of Java’s RMI

There is no access
transparency at all in
Java’s RMI

The large difference in
performance between
the java interpreter and
JIT compilers indicates
that RMI involves an
amount of inefficient
Hello world!
HelloClient
Registry
HelloImpl_Stub.class
Stub
HelloImpl_Skel.class
“Hello World!”
HelloImpl_Impl.class
Java code
Programming with sockets

The socket version of a program is faster than the RMI one, but
results in increased program size and thus reduces productivity
and maintainability

The implementation of programs becomes more difficult and
inefficient:
– The programmer has to write a communication protocol
– Communication is handled by the operating system

Neither the socket nor the RMI version can take advantage of
locality in distributed applications
Getting Explicit Parallelism
Optimized Standard JVM
Implementations:
 Substituting
RMI and sockets by
more efficient ones but
preserving JVM byte-codes
 Advantages:
–truly object oriented, supports
all data types of Java
programming, it is garbage
collected
 Drawbacks:
–RMI is too slow for HPP
programming
Native code JVM
Implementations:
 Through
a common Message
Passing Java (MPJ) API
 Advantages:
–full performance of native
code
–extensive code optimizations
–a basis for conversion
between C, C++, Fortran and
Java
 Drawbacks:
–Programming features of
Java could result
compromised
JavaParty: an “optimized
implementation” of RMI

Run Time Manager

migration
create class object

Local JP
Local JP
reset
call for
current state
runtime environment

runtime environment
access distributed
environment
JavaParty software architecture

Objects can migrate between
nodes transparently to the
programmer
JP extends Java’s RMI with a
pre-processor and a runtime
The runtime system is used to
access the static entities of
each class
For each class that is loaded
dynamically a single object is
created remotely
JP improves locality and
reduces communication time
Manta system: an efficient
implementation of RMI

The objective is to push the runtime overhead to compile
time while supporting polymorphic remote method
invocation and allowing interoperability with other JVMs.

By using a native Java compiler it is possible for the
performance of RMI to equal that of other parallel
languages

Manta supports dynamic class loading by compiling
methods and creating serializations at run time

To support interoperability with other JVMs, Manta has a
byte-code-to-native compiler startable at run time
Manta/JVM interoperability
Manta process
generated serializers
Manta process
Manta RMI
generated serializers
protocol
application
JVM
Sun RMI
application
Sun RMI protocol
+
generic serialisation
HTTP server
JVM
protocol
BYTECODE
serializer generation
Class files
bytecode compiler
HTTP server
Class files
bytecode loader
BYTECODE
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
Numerical Computing in Java
Complex x = new Complex(5,2);
Complex y= new Complex(2,-3);
Complex z= a.times(x).plus(y);
X
Y
0
0
1
1
2
2
3
3
4
4
Wide-scale
adoption of
Java as a language for
numerical computing
 Current difficulties to
overcome:
 Inefficient support for
complex numbers
 Lack of multidimensional
arrays
 overrestrictive floating-point
semantics
Computational Performance

Lack of IEEE floating point standard.
Things like extended double hardware is
inaccessible to Java programs (such as
the IEEE 754 double extended precision,
or long double)! Computational
reproducibility at the cost of speed,
precision.
Numerical Libraries in Java
Java programmed libraries
Native code libraries
 Advantages:
– porting easiness
through JNI, legacy
code reuse,
– adherence to standards
(MPI, LAPACK, etc.)
 Compromised:
– Robustness
– Reproducibility
– Portability
– Performance
 Freely
distributed
 Strong
dependency on future
Java specifications, JIT
optimizations, etc.

Promising results for
currently ported libraries:
– JNL (Visual Numerics)
– JAMAL (MathWorks)
– NIST
Talk outline

Overview

Motivation

The Java programming model

Multithread programming in Java

Data Parallelism in Java

The “Grid” High Performance Computing

Numerical Computing in Java

Conclusion
Performance of RMI protocols

1630
– JDK serialization mechanism
RPC conventional
1500
1311
Sun JDK 1.1.4
Latency (microsec.)
1250
Latency (microsec.)
1000
Sun JIT 1.2
720
750
500
250
0
Reasons for RMI overhead:
– stream management and
data copying to external
buffers
– method dispatch and low
level network
communications
JavaParty+KaRMI
+optimizations

The overhead is currently in the
range of 0.4 to 2.2 ms.

Serialization can take up to 65%
of time in slower JDKs
228
Java performance
SciMark on 500 Mhz PIII

58
60
using full optimization
C Borland 5.5

Mflops.
50
40
45
Performance greatly varies across
Latency (microsec.)
42
40
computing platforms: highest
MS VC++ 5.0
scores on Intel, AMD Athlon;
30
Java Sun 1.2
20
Java MS 1.1.4
10
Competitive with C compilers,
lowest ones on Ultra SPARC 50,
SGI MIPS, AlphaEV6

Mainly depends on the
implementation technology of the
0
JVM
Synoptic comparison of proposals
L
E
C
M
S
L
P
S
N
RMI
Low
No
No
Yes
TCP/IP
Yes
Sun
USA
JavaParty
Medium
No
Yes
Yes
TCP/IP
Yes
Sun
D
SPMD Mediumlow
No
No
Yes
TCP/IP
Yes
Does not
apply
USA
JDSM
High
Yes
Possible
Yes
Several
Yes
Socket
Interface
JP
Manta
High
Yes
No
No
Own
Protocol
Yes
Own
Protocol
NL
L: Language
E: Efficiency
C: Code
optimization
M: Object
migration
S: Standardization
L: Low level
communications
S: Serialisation
protocol
P: Polymorphism N: Nationality
of research
Summary & Conclusions

The standard constructs provided by Java (monitor/thread
model) for multithreading have been identified as a serious
hindrance to developing reliable parallel distributed software

Integration of communication frameworks (MPI, RMI, DSM,
etc.) is necessary to cover the range of portability,
uploading, soft install, migration, etc. of today applications

Possible solutions take advantage of existing Java
framework for the development and integration of many
“Grid” applications

A variety of solutions aimed at computational performance
for High Performance Computing necessities: Data
Parallelism, Grid applications and Numerical Computing
have been reviewed
References and further addresses


“Multi-Paradigm Communications in Java for Grid
Computing”. CACM, 44, 10, pp.118-124, 2001.
By V.Getov, G.Laszewski, M.Philippsen, I.Foster.
“Java and Numerical Computing”. IEEE Computing
Science and Engineering, 3, 2, pp.18-24, 2001.
By R.Boisvert, J.Moreira, M.Philippsen, R.Pozo
The Java Grande Community Official page
http:// www.javagrande.org
Numeric Class libraries
http:// Math.nist.gov/javanumerics
Personal page of M. Philippsen at Karlsruhe
http:// wwwipd.ira.uka.de
The Manta project
http://www.cs.vu.nl