Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Communicating Threads and Processes in Java -an approach to reliable parallel programming Manuel I. Capel-Tuñón Overview – Although not initially designed as a High Performance Parallel Programming language, Java is an attractive candidate for it. – Would like to review a selection of programming models and systems to bring University of Granada, Spain. [email protected] HP Computing to Java. Motivation Web-based global computing: –High demand of geographically distributed resources –Useful for “strategic” applications: financial modeling, computational genetics, weather forecasting, etc. –Now taking advantage of spare cycles in computers across Internet is technologically feasible Programming features will also make Java the language of choice for numerical computing Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion The Java Parallel Programming Model Fits better under control parallelism Threads are created and managed explicitly as objects Weak consistency memory model (memory model easily mapped onto shared-memory systems) Creation methods for groups of threads not provided Remote Method Invocation, for distributed systems Java’s software architecture Local disk X.class Y.class Native javac libraries JVM source source thread thread worki ng files Network stack memo0101 ry 1011 assign thread worki 1000 ng 0101 1101 stack use 1011 memo ry assign 1000 0101 use 1011 1101 1000 1101 A.class B.class services GUI Operating system hardware files Main memory cache central stack Garbage collector Heap Models for running Java byte-code Hardware Native code compiler Just-In-Time (JIT) compiler java interpreter Java’s software architecture Local disk Native X.class Y.class libraries javac verifier source JNI files Class Main memory loader cache Network JIT native executable A.class B.class services GUI Operating system hardware central stack Garbage collector Heap Java’s memory model JVM main memory 01010101 1011 1011 10001000 thread 1101 1101 working 01010101 stack memory store 1011 1011 thread assign 0101 10001000 working load 1101 1101 1011 stack memory thread use assign 01011000 working 1101 central stack 1011 memory results stack assignuse 01011000 1101 1011 use 1000 operands 1101 boundary weak consistency:shared variables updates are only made visible to other threads in synchronised code blocks uses a computation model based on a (global) stack Efficient Java memory model code interpretation but not execution on register based processors Taking advantage of registers int A, B, C, j; A = 4; B = 8; for (j=0; j< 10; j++) { C= A + B; ... // A and B are modified in the body of the loop } A register based architecture will load the variables once for the entire loop JVM pushes A, B on the stack each time C is computed! A byte-code to native code compiler will eliminate the execution overhead Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion Multithread programming in Java Java contains a series of constructs and specific classes for concurrent programming: – java.lang.Thread – java.lang.Object: » wait() » notify() » notifyAll() – synchronized, volatile Object Runnable Multithreading is a basic feature for the implementation of good applications in Java Thread Thread states and scheduling new runnable I/O ends end sleep notify( ) notifyAll( ) join( ) ends suspend( ) yield( ) running suspend( ) suspended I/O begins, sleep( ), wait( ), join( ) blocked Suspended -blocked There is a set of methods which produce a change in the state of a given thread: – – – – – – – – – – – new wait() notify() notifyAll() start() yield() suspend() resume() sleep(time) join() stop() The ThreadGroup Class public void Barrier(Semaphore barrier){ to create or start all thread barrier.counter -=1; members simultaneously if barrier.counter >0 barrier.wait(); else { barrier.notifyAll(); } } Collective communication must also be explicitly programmed barrier.counter= num_threads; Java does not provide methods by adding mutual exclusion and auxiliary variables, a low-level programming style results! Synchronization between asynchronous threads Threads Synchronized method ready queue Service methods public synchronized void deposit(double v){ while(count == slots) try{wait();} catch(...){} buffer[pIn]= v; ... count++; if (count == 1) notify(); } deposit(...) notify fetch() releases object’s lock executes synchronized block synchronized methods give a simple way of exclusive access and avoid race hazards between threads Java threads synchronization is loosely based on monitor construct with signal-and-continue semantics Synchronization in the access to shared variables is problematic since the semantics of notify() is error prone anonymous condition queue acquires object’s lock The public synchronized double fetch(){ while(count == 0) try{wait();} catch(...){} buffer[pIn]= v; ... count--; if (count==slots-1) notify(); } The logic within the monitor methods involved in wait-notify pairs has to be tightly coupled Monitors, as passive entities, cannot prevent their methods being called Producer-consumer data sharing class Barber_shop{ .... //monitor variables public synchronized get_haircut() //called by customers{ while(barber==0) wait(); barber--;chair++;notify(); ... } public get_next_customer() //called by the barber{ barber++; notify(); while (chair == 0)wait(); chair--; } ... } // end classs Semantic of notifications I'm notifying: next client can be served ! Correct producer-consumer monitor Monitor Barber_shop{ .... //monitor variables procedure get_haircut; //called by customers{ if(barber==0) available.wait; barber--; chair++; if (NotOcupied.queue)NotOcupied.signal } procedure get_next_customer; //called by the barber{ barber++; if (available.queue) available.signal; if (chair==0)NotOcupied.wait(); chair--; } } // end Monitor Semantic of signals I'm signaling next awaiting client ! Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion Data Parallelism in Java Simultaneous operations on disjoint partitions of data by multiple processors Data give the parallel dimension of programs in this model Data Parallelism in Java a11 a12 a13 ... a1n b11 b12 b13 ... b1k c11 c12 c13 ... c1n a21 a22 a23 ... a2n b21 b22 b23 ... b2k c21 c22 c23 ... c2n ... ... ... am1 am2 am3... amn bm1 bm2 bm3... bmk ck1 ck2 ck3... ckn Data dependency Data to Processor mapping Cluster of processors Data Parallelism in Java Homogeneous parallelism, lightweight processes, regularly spawned, ordered event sequences Idiomatic features: – Producer-consumer operation for synchronization and data communication between threads – Creating and starting multiple threads simultaneously – Collective communication/synchronization between sets of threads Java makes expressing data parallelism awkward SPMD Java Library Interface TSP Köln D7 Koblenz D2 D1 D3 D6 Dortmund Localization D4 Kasel D5 Frankfurt Global work queue BS of shared data-structures as objects of a SPMD-Java library Now a explicit communication between processes is needed Best solution Communication found links among threads will be encapsulated in shared objects Elements of dynamic objects may be migrated during program execution Thread groups that share an object define their connection topology The proposed solution Master Assign subproblem BS Update bs Load bs bs bs bs slave1 slave2 slave3 Distributed-BBapplication construction for the TSP Processor limits bs Best solution DOF Work queue DOF Specification of a logical topology for a given distribution of global data, so that access from remote servers to locally assigned data is made easier Methodological software construction based on distributed active objects (DOFs) which encapsulate global data structures and topologies Processes and DOFs are distributed to the servers by the programmer according to each parallel distributed application The approach is aimed at hiding low level communication, global data distribution transparen-cy and access locality to any object in the program The programming environment Network communications between the APs are hidden behind the interface of a class of objects. A class of active objects -called DOFs- establish communication links between them according to a logical connection topology Distributed Object Fragment objectProxy Input Handler Application interface externalComm(...) send(...) Output Handler Application process Virtual methods, such as send(..), externalComm(..) and others, must be explicitly programmed. proxies encapsulate data and provide communication facilities to the application processes. Run time + JVM The socket library Transport layer Communication layer Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion A step ahead: the “Grid” computing Here Spawn This Linux SGI NT Solaris Cluster Cluster Collect spare cycles, thus computational power will be enhanced! Idle processor machine Data repository Wide-scale HP programming paradigm Concurrent Processes Processes User Data 0101 1011 1000 0101 1011 1000 Data User Computational Resources Computational Resources Data Processes 0101 1011 1000 User Computational Resources Computing –merging and splitting of multiple virtual machines –platform and performance portability, safety, reusability –multilanguage support to include Java, C, C++, and Fortran. –message passing supported paradigm for parallel and distributed computing Remote Method Invocation Server program Client program results Remote method invocation Remote procedure RMI supports polymorphism at object’s method invocation call Java passes objects by reference Sun’s RMI protocol + serialisation references Parameters unpacked as parameters of an RMI call have to be passed in network-wide representation Parameters packed results Transport layer message The Internet Current RMI implementation rules out transparent remote invocation of methods Programming with Java’s RMI The Interface The Server side public interface Hello extends Remote{ public class HelloImpl extends UnicastRemoteObject public String sayHello() throws java.rmi.RemoteException; implements Hello{ //previously declared interface } The Client side public HelloImpl() throws RemoteException{ super(); } public String sayHello() throws RemoteException{ public static void main(String args [ ]) { returns ''Hello World!''; System.setSecurityManager(new RMISecurityManager( )); } try{ public static void main(String args[]){ Hello h = (Hello) Naming.lookup(''rmi://ockham.ugr.es/hello''); try{ HelloImpl h = HelloImpl(); String message = h.sayHello( ); Naming.rebind(''hello'', h) System.out.println(''HelloClient: '' + message); } } catch(RemoteException re){ catch(RemoteException re){ ... ... } } } } } Implementation of Java’s RMI There is no access transparency at all in Java’s RMI The large difference in performance between the java interpreter and JIT compilers indicates that RMI involves an amount of inefficient Hello world! HelloClient Registry HelloImpl_Stub.class Stub HelloImpl_Skel.class “Hello World!” HelloImpl_Impl.class Java code Programming with sockets The socket version of a program is faster than the RMI one, but results in increased program size and thus reduces productivity and maintainability The implementation of programs becomes more difficult and inefficient: – The programmer has to write a communication protocol – Communication is handled by the operating system Neither the socket nor the RMI version can take advantage of locality in distributed applications Getting Explicit Parallelism Optimized Standard JVM Implementations: Substituting RMI and sockets by more efficient ones but preserving JVM byte-codes Advantages: –truly object oriented, supports all data types of Java programming, it is garbage collected Drawbacks: –RMI is too slow for HPP programming Native code JVM Implementations: Through a common Message Passing Java (MPJ) API Advantages: –full performance of native code –extensive code optimizations –a basis for conversion between C, C++, Fortran and Java Drawbacks: –Programming features of Java could result compromised JavaParty: an “optimized implementation” of RMI Run Time Manager migration create class object Local JP Local JP reset call for current state runtime environment runtime environment access distributed environment JavaParty software architecture Objects can migrate between nodes transparently to the programmer JP extends Java’s RMI with a pre-processor and a runtime The runtime system is used to access the static entities of each class For each class that is loaded dynamically a single object is created remotely JP improves locality and reduces communication time Manta system: an efficient implementation of RMI The objective is to push the runtime overhead to compile time while supporting polymorphic remote method invocation and allowing interoperability with other JVMs. By using a native Java compiler it is possible for the performance of RMI to equal that of other parallel languages Manta supports dynamic class loading by compiling methods and creating serializations at run time To support interoperability with other JVMs, Manta has a byte-code-to-native compiler startable at run time Manta/JVM interoperability Manta process generated serializers Manta process Manta RMI generated serializers protocol application JVM Sun RMI application Sun RMI protocol + generic serialisation HTTP server JVM protocol BYTECODE serializer generation Class files bytecode compiler HTTP server Class files bytecode loader BYTECODE Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion Numerical Computing in Java Complex x = new Complex(5,2); Complex y= new Complex(2,-3); Complex z= a.times(x).plus(y); X Y 0 0 1 1 2 2 3 3 4 4 Wide-scale adoption of Java as a language for numerical computing Current difficulties to overcome: Inefficient support for complex numbers Lack of multidimensional arrays overrestrictive floating-point semantics Computational Performance Lack of IEEE floating point standard. Things like extended double hardware is inaccessible to Java programs (such as the IEEE 754 double extended precision, or long double)! Computational reproducibility at the cost of speed, precision. Numerical Libraries in Java Java programmed libraries Native code libraries Advantages: – porting easiness through JNI, legacy code reuse, – adherence to standards (MPI, LAPACK, etc.) Compromised: – Robustness – Reproducibility – Portability – Performance Freely distributed Strong dependency on future Java specifications, JIT optimizations, etc. Promising results for currently ported libraries: – JNL (Visual Numerics) – JAMAL (MathWorks) – NIST Talk outline Overview Motivation The Java programming model Multithread programming in Java Data Parallelism in Java The “Grid” High Performance Computing Numerical Computing in Java Conclusion Performance of RMI protocols 1630 – JDK serialization mechanism RPC conventional 1500 1311 Sun JDK 1.1.4 Latency (microsec.) 1250 Latency (microsec.) 1000 Sun JIT 1.2 720 750 500 250 0 Reasons for RMI overhead: – stream management and data copying to external buffers – method dispatch and low level network communications JavaParty+KaRMI +optimizations The overhead is currently in the range of 0.4 to 2.2 ms. Serialization can take up to 65% of time in slower JDKs 228 Java performance SciMark on 500 Mhz PIII 58 60 using full optimization C Borland 5.5 Mflops. 50 40 45 Performance greatly varies across Latency (microsec.) 42 40 computing platforms: highest MS VC++ 5.0 scores on Intel, AMD Athlon; 30 Java Sun 1.2 20 Java MS 1.1.4 10 Competitive with C compilers, lowest ones on Ultra SPARC 50, SGI MIPS, AlphaEV6 Mainly depends on the implementation technology of the 0 JVM Synoptic comparison of proposals L E C M S L P S N RMI Low No No Yes TCP/IP Yes Sun USA JavaParty Medium No Yes Yes TCP/IP Yes Sun D SPMD Mediumlow No No Yes TCP/IP Yes Does not apply USA JDSM High Yes Possible Yes Several Yes Socket Interface JP Manta High Yes No No Own Protocol Yes Own Protocol NL L: Language E: Efficiency C: Code optimization M: Object migration S: Standardization L: Low level communications S: Serialisation protocol P: Polymorphism N: Nationality of research Summary & Conclusions The standard constructs provided by Java (monitor/thread model) for multithreading have been identified as a serious hindrance to developing reliable parallel distributed software Integration of communication frameworks (MPI, RMI, DSM, etc.) is necessary to cover the range of portability, uploading, soft install, migration, etc. of today applications Possible solutions take advantage of existing Java framework for the development and integration of many “Grid” applications A variety of solutions aimed at computational performance for High Performance Computing necessities: Data Parallelism, Grid applications and Numerical Computing have been reviewed References and further addresses “Multi-Paradigm Communications in Java for Grid Computing”. CACM, 44, 10, pp.118-124, 2001. By V.Getov, G.Laszewski, M.Philippsen, I.Foster. “Java and Numerical Computing”. IEEE Computing Science and Engineering, 3, 2, pp.18-24, 2001. By R.Boisvert, J.Moreira, M.Philippsen, R.Pozo The Java Grande Community Official page http:// www.javagrande.org Numeric Class libraries http:// Math.nist.gov/javanumerics Personal page of M. Philippsen at Karlsruhe http:// wwwipd.ira.uka.de The Manta project http://www.cs.vu.nl