Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A-JUMP Message Passing Framework for Globally Interconnected Clusters Mr. Sajjad Asghar Advance Scientific Computing Group, National Centre for Physics, Islamabad, Pakistan. Research Papers 2 Mehnaz Hafeez, Sajjad Asghar, Usman A. Malik, Adeel-ur-Rehman, and Naveed Riaz, “Message Passing Framework for Globally Interconnected Clusters”, CHEP 2010, Taipei, Taiwan (Accepted). Sajjad Asghar, Mehnaz Hafeez, Usman A. Malik, Adeel-ur-Rehman, and Naveed Riaz, “A Framework for Parallel Code Execution using Java”, IEEE International Conference IEEE International Conference TENCON2010, Fukuoka, Japan (Accepted). Mehnaz Hafeez, Sajjad Asghar, Usman A. Malik, Adeel-ur-Rehman, and Naveed Riaz, “Survey of MPI Implementations delimited by Java”, IEEE International Conference TENCON2010, Fukuoka, Japan (Accepted). Adeel-ur-Rehman, Sajjad Asghar, Usman A. Malik, Mehnaz Hafeez and Naveed Riaz , “Multi-lingual Binding of MPI Code on A-JUMP”, IEEE International Conference on Frontiers of Information Technology, Islamabad Pakistan (Submitted). Sajjad Asghar, Mehnaz Hafeez, Usman A. Malik, Adeel-ur-Rehman, and Naveed Riaz, “A-JUMP, Architecture for Java Universal Message Passing”, IEEE International Conference on Frontiers of Information Technology, Islamabad Pakistan (Submitted). CHEP 2010 5/3/2017 Layout Introduction Related Work A-JUMP Frame work Message Passing for Interconnected Clusters Performance Analysis Future work & Conclusion References 3 CHEP 2010 5/3/2017 Message Passing Interface 5 Defacto Standard and defines interfaces for C/C++ and Fortran Programmer defines the work and data distribution How and when communication has to be done via communication according to the underlying algorithm. Synchronization is implicit (can also be user-defined) Support distributed memory architectures CHEP 2010 5/3/2017 Message Passing Interface Features 1. 2. 3. 4. 5. 6. 7. 6 Standardization Portability Performance Efficient implementations Availability Functionality Scalability The MPI Forum identified some critical shortcomings of existing message passing systems in areas such as complex data layouts or support for modularity and safe communication. CHEP 2010 5/3/2017 Java and MPI 7 Java is an excellent basis for implementing message passing framework. Java supports multi-paradigm communications environment. Java’s bytecode can be executed securely on many platforms. It allows integrating many technologies that are used for existing software development. Java provides many desired features for building wide range of applications like; standalone, client-server, semantic web, enterprise, mobile, graphical or distributed. Thus, it makes Java an excellent basis for achieving productivity of parallel applications. CHEP 2010 5/3/2017 Layout Introduction Related Work A-JUMP Frame work Message Passing for Interconnected Clusters Performance Analysis Future work & Conclusion References 8 CHEP 2010 5/3/2017 9 CHEP 2010 5/3/2017 Architecture for Java Universal Message Passing (A-JUMP) Framework A-JUMP is an HPC framework built for distributed memory architectures in order to facilitate explicit parallelism. A-JUMP is purely written in Java. A-JUMP is based on mpijava1.2 specifications. A-JUMP backbone is called High Performance Computing Bus (HPC Bus). 10 CHEP 2010 5/3/2017 HPC Bus of A-JUMP 11 Communication backbone of the A-JUMP Based on Apache ActiveMQ Includes inter-process communication, monitoring and propagation of Registry information. Provides interoperability between hardware resources and communication protocols. Ensures that any changes in communication protocols and network topologies will remain transparent to end users. Ensures loosely coupled services and components. Segregate the application and communication layers. CHEP 2010 5/3/2017 HPC Bus of A-JUMP C++ Client 12 CHEP 2010 5/3/2017 Architecture for Java Universal Message Passing (A-JUMP) Framework 13 A-JUMP is designed to provide the following features to write parallel application; Message Passing for Interconnected Clusters Multiple Languages Interoperability Portability Heterogeneity for resource environment Network Topology Independence Multiple Network Protocol Support Communication Mode Autonomy (i.e. asynchronous communication) Different Communication Libraries CHEP 2010 5/3/2017 A-JUMP Component Diagram 14 CHEP 2010 5/3/2017 Layout 15 Introduction Related Work Architecture for Java Universal Message Passing (A-JUMP) Framework Message Passing for Interconnected Clusters Performance Analysis Future work & Conclusion References CHEP 2010 5/3/2017 Grid Computing and MPI GRID MPI Dynamic Resources Static can add/delete dynamically Static Configuration Host File Voluminous data exchange over Internet Data exchange over cluster Design for internet computing Design for cluster computing Heterogeneous resource environment 16 CHEP 2010 Homogenous resource environment 5/3/2017 Grid Computing and MPI GRID MPI Specific to certain scientific applications Generic and not specific to certain scientific applications Independent from base Operating System Code execution is Operating System dependent Support for different communication protocols paradigm No support for different communication protocols paradigm High network latencies due to complex layout Low network latencies for message passing 17 CHEP 2010 5/3/2017 A-JUMP, Message Passing for Interconnected Clusters 18 It supports the dynamic behaviour of interconnected clusters with MPI functionality. It is written in pure Java and does not incorporate any native code/libraries. It provides asynchronous mode of communication for both message passing and code distribution over the clusters. It provides sub-clustering of resources for different jobs to utilize free resources efficiently. It is designed for both internet and cluster computing. CHEP 2010 5/3/2017 A-JUMP, Message Passing for Interconnected Clusters 19 CHEP 2010 5/3/2017 HPC Bus (interconnected clusters) C++ Client 20 CHEP 2010 5/3/2017 A-JUMP, Message Passing for Interconnected Clusters 21 It segregates the communication layer from the application code. It is easy to adopt new communication technologies without changing the application code. It offers support for both homogeneous and heterogeneous clusters for parallel code execution. A set of easy to use APIs are part of the framework. CHEP 2010 5/3/2017 Layered Architecture 22 CHEP 2010 5/3/2017 A-JUMP framework Components Machine Registry Dynamic Resource Discovery Dynamic Resource Allocation Message Security Job Scheduler Monitoring Code Migrator/Execution API Classes 23 CHEP 2010 5/3/2017 Machine Registry Machine Registry provides a list of machines that are dynamically added or removed from the framework. To become the part of the framework, every worker machine has to run Code Migrator/Execution service. 24 Code Migrator/Execution service sends the request to the Machine Registry for a unique rank. Machine Registry assigns unique ranks to the newly added machines. Machine Registry is also responsible to inform Monitoring module about the newly added resources. In future It will records the static information about each machine which has been registered i.e. number of CPUs and number of cores per CPU. CHEP 2010 5/3/2017 Dynamic Resource Discovery (DRD) 25 It keeps track of the resources that are part of the AJUMP clusters (system size). It gets the information from Machine Registry to index the available machines that are registered with the Machine Registry. DRD fetches the information about the state of the machines from Monitoring. The states are defined as free and busy. CHEP 2010 5/3/2017 Dynamic Resource Allocation (DRA) 26 DRA finds the availability of free resources on A-JUMP clusters. Client sends the request for the required numbers of free processors that are needed to run the job. DRA searches the free available resources from the index maintained by DRD and monitoring. It creates a list of machine that could be used to define a sub cluster from available machines and return this list to client APIs. CHEP 2010 5/3/2017 Message Security 27 It provides the facility to encrypt the outgoing messages. Current implementation is based on simple symmetric encryption technique. Requires more performance analysis CHEP 2010 5/3/2017 Job Scheduler(JS) 28 JS distributes incoming jobs to the cluster. The jobs are sent to the JS by client APIs through ActiveMQ. It works on queuing mechanism which keeps incoming jobs queued until free resources are available for execution. If the number of available resources is less than the required resources then job will not be submitted. Otherwise, a sub-cluster is created by JS according to the requirement. Free machines will remain available for new job request. P2P-MPI is the only other implementation which provides sub- clustering. CHEP 2010 5/3/2017 Monitoring Job Scheduler gets the information from the Monitoring component. Monitoring records the dynamic information about the computational resources that are part of the framework. It records number of busy and free CPUs. It helps the Job Scheduler to find the best match for task in hand. Monitoring maintains two distinct states of the machines available in the cluster; these are free and busy. 29 As soon as a machine starts job execution, it informs Monitoring to change its state from free to busy. Monitoring is also responsible to set the state of the machine back to free as soon as the job gets finished. CHEP 2010 5/3/2017 Code Migrator/Execution Code Migrator/Execution distributes and executes a task over the network. A built-in communication API is a part of this component. Enables the tasks to do collective and point to point communication. Both communications between the tasks as well as distribution of code are done in an asynchronous way through ActiveMQ. As soon as a resource gets free, the job is dispatched by Job Scheduler to the designated machine. The Code Migrator/Execution on the destination machine accepts the codes and starts its execution. 30 The underlying code distribution takes place through Code Migrator/Execution. The distribution is done through ActiveMQ. The streamed output is redirected to the user machine. The basic send() and receive() are also incorporated with the help of communication libraries written purely in Java. CHEP 2010 5/3/2017 Code Migrator/Execution Code Adaptor 31 CHEP 2010 5/3/2017 Communication APIs The proposed framework offers easy to use set of communication APIs for writing parallel code. These APIs are close to MPI specifications for Java (mpijava1.2). API Classes 32 Sub cluster Grouping (MPI_COMM_WORLD ) MPI.Init MPI.Finalize MPI.COMM_WORLD.Rank MPI.COMM_WORLD.Size Send Receive Broadcast Gather Scatter Allgather Reduce CHEP 2010 5/3/2017 Workflow of A-JUMP for Interconnected Clusters 33 CHEP 2010 5/3/2017 Figure illustrates A-JUMP relationship and the information shared between the different components. It describes the step by step the flow of program running parallel on different clusters. There are three main entities; A client is a process looking for computing resources. A code adaptor is a process offering computing resources Each machine in cluster must run Code Adaptor to receive the incoming program files. The role of a Code Adaptor or a client is not fixed. A machine may serve as a Code Adaptor as well as client. DRD/DRA is a process that allocates the free resources to the client for job submission. 34 Clients Code Adaptors Dynamic Resource Discovery/Dynamic Resource Allocation (DRD/DRA) DRD/DRA maintains the index of resources and creates subclusters based on free computing resources. CHEP 2010 5/3/2017 2. Resource 3. 4. 5. 6. 7. 8. Request Get Create Return Submit Set Status Free Sub Sub afor Resources Job Monitoring free Cluster Cluster Number byor the busy Client of processes APIs 1. Registration of New aInformation Machine Workflow of A-JUMP for MultiEach new resource the cluster isof assigned ainformation unique rank by the Machine Client DRD/DRA Monitoring It Once When Machine is the client sends Code responsibility Registry seeks creates receives Adaptors a request the is ainresponsible the information sub of start for DRA/DRD information cluster number execution of of to to free available send about processes of forward available the information the submitted free free required processors resources required tojob Monitoring to of it run informs the resources, from and the free Monitoring. sends incase job. Machine subitBefore itcluster submits to a new It Registry. Once the new resource is registered, this information is propagated searches DRD/DRA. to the Registry resource the jobclient. isClusters to about is submitted the added, Inindexed specific the case, change updated, toifmachines free the required cluster/clusters, processes in and theresources identified deleted. status for from sub Also by are thethe free clustering. not user when DRD/DRA. to available sends busy. and status At request then These thegets same DRD/DRA to machines change the time to over Internet the Monitoring. from reside DRD/DRA sends may Machine free a message Registry tofor on busy the different or given also about vice sends clusters number the versa unavailability the Machine or ofupdates on freea processes. single Registry in of the resources cluster. change sendsto the ofthe status updates client. to the to Monitoring. As soon as the submitted job finishes, a request of change in status from busy to free is sent back to machine Registry. Code Adaptors are also responsible to sends results back to the client. 35 CHEP 2010 5/3/2017 Layout 36 Introduction Related Work Architecture for Java Universal Message Passing (A-JUMP) Framework Message Passing for Interconnected Clusters Performance Analysis Future work & Conclusion References CHEP 2010 5/3/2017 Performance Analysis Communication Performance Measurement Code Speed-up Performance Analysis 37 Ping Pong Latency Test Network Throughput Measurement Embarrassingly Parallel (EP) Benchmark JGF Crypt Benchmark CHEP 2010 5/3/2017 P2P-MPI 38 S. Genaud and C. Rattanapoka, “P2P-MPI: A Peer-toPeer Framework for Robust Execution of Message Passing Parallel Programs,” Journal of Grid Computing, 5(1):27-42, 2007. Strength Pure Java Supports distributed memory models Supports heterogeneous environment Supports high speed networks Weakness No support for high speed networks Slow Performance CHEP 2010 5/3/2017 Test Environment 39 Results are collected on an environment that simulates globally interconnected clusters. The hardware resources involved in conducting the tests include; 9 single core machines divided into two clusters. The clusters are connected with 1 Gbps bandwidth. The machines are running Windows Server 2003 (with SP2) and Windows XP (with SP3). The TCP window size is default on all the machines in clusters. No optimization is done at the level of OS as well as at the level of hardware. CHEP 2010 5/3/2017 Time (ms) Ping Pong Latency Test Data Size (bytes) 41 CHEP 2010 5/3/2017 Network Throughput Measurement Mbps Chart Title 45 40 35 30 25 20 15 10 5 0 A-JUMP (inter connected Clusters) P2P-MPI 1 100 1K 4K 8K 16K 32K 64K 128K 256K 512K Data Size (bytes) 42 CHEP 2010 5/3/2017 Code Speed UP Performance Analysis (NAS EP) 120 A-JUMP (Interconnected Clusters) P2P-MPI 100 Tinme(sec) 80 60 40 20 0 2 4 6 8 Number of CPUs 43 CHEP 2010 5/3/2017 Code Speed UP Performance Analysis (NAS EP) Performance Gain of A-JUMP 30 25 25.24% 22.28% Gain(%) 20 15 10 5 8.33% 6.44% 0 2 4 6 8 Number of CPUs 44 CHEP 2010 5/3/2017 Code Speed UP Performance Analysis (JGF Crypt Class A) 2.5 A-JUMP interconnected clusters P2P-MPI Time(sec) 2 1.5 1 0.5 0 2 4 6 8 Number of CPUs 45 CHEP 2010 5/3/2017 Code Speed UP Performance Analysis (JGF Crypt Class B) 16 A-JUMP interconnected clusters 14 P2P-MPI2 Time (sec) 12 10 8 6 4 2 0 2 4 6 8 Number of CPUs 46 CHEP 2010 5/3/2017 Layout 47 Introduction Related Work Architecture for Java Universal Message Passing (A-JUMP) Framework Message Passing for Interconnected Clusters Performance Analysis Conclusion and Future work References CHEP 2010 5/3/2017 Conclusion & Future Work A-JUMP for interconnected clusters is loosely coupled with its communication layer It supports sub clustering It provides dynamic resource management It carries multiple communication protocols It supports asynchronous message passing It supports heterogeneous environment Performance analysis results of A-JUMP 48 For small message sizes are better than P2P-MPI For large message sizes are promising and comparable with P2P-MPI CHEP 2010 5/3/2017 Conclusion & Future Work 49 The communication Library will be improved for message sizes >= 256k Support for different security techniques The other area of focus will be to improve network latencies and lessen the communication overheads Multilingual Support for Interconnected Clusters More NAS and JGF benchmarks CHEP 2010 5/3/2017 Layout 50 Introduction Related Work Architecture for Java Universal Message Passing (A-JUMP) Framework Message Passing for Interconnected Clusters Performance Analysis Conclusion and Future work References CHEP 2010 5/3/2017 References 51 Ian Foster, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” in Proc. 7th International Euro-Par Conference on Parallel Processing, Manchester, 2001, pp. 1–4. LCG homepage [Online]. Available: http://lcg.web.cern.ch/lcg. EGEE homepage [Online]. Available: http://www.eu-egee.org. S. Haug, F. Ould-Saada, K. Pajchel, and A. L. Read, “Data management for the world’s largest machine,” in Applied Parallel Computing. State of the Art in Scientific Computing, vol. 4699, Springer Berlin / Heidelberg, 2007, pp. 480-488. P. Eerola et al., “The NorduGrid production grid infrastructure, status and plans,” in IEEE/ACM Grid, Nov. 2003, pp. 158– 165. TeraGrid Project homepage [Online]. Available: http://www.teragrid.org F. Cappello et al., “Grid’5000: a large scale and highly reconfigurable experimental grid testbed.” In Proc. 6th IEEE/ACM International Workhop on Grid Computing - 2005, pp. 99-106. J. Dongarra, D. Gannon, G. Fox, and K. Kennedy. The Impact of Multicore on Computational Science Software. CTWatch Quarterly, 2007 [Online]. Available: http://www.ctwatch.org/quarterly/articles/2007/02/the-impact-of-multicore-oncomputational-science-software. S. Genaud and C. Rattanapoka, “P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs,” Journal of Grid Computing, 5(1):27-42, 2007. S. Schulz, W. Blochinger, and M. Poths, “A Network Substrate for Peer-to-Peer Grid Computing beyond Embarrassingly Parallel Applications,” in Proc. WRI International Conference on Communications and Mobile Computing – 2009, Volume 03, pp. 60-68. M. Humphrey and M. R. Thompson, “Security Implications of Typical Grid Usage Scenarios,” Cluster Computing, vol. 5, Issue-03, 2002, pp. 257-264. P. Phinjaroenphan, S. Bevinakoppa, and P. Zeephongsekul, “A Method for Estimating the Execution Time of a Parallel Task on a Grid Node,” in Proc. Eurepean Grid Conefernce, Amsterdam, 2005, pp. 226-236. A. Nelisse, J. Maassen, T. Kielmann, and H. Bal. “CCJ: Object-Based Message Passing and Collective Communication in Java,” Concurrency and Computation: Practice & Experience, 15(3-5):341-369, 2003. J. Al-Jaroodi, N. Mohamed, H. Jiang, and D. Swanson, “JOPI: a Java object-passing interface: Research Articles,” Concurrency and Computation: Practice & Experience, vol. 17, Issues 7-8, pp. 775-795, 2005. P. Martin, L. M. Silva, and J. G. Silva, “A Java Interface to WMPI”. In 5th European PVM/MPI Users’ Group Meeting, EuroPVM/MPI’98, Liverpool, UK, Lecture Notes in Computer Science, vol. 1497, pp. 121-128. Springer, 1998. S. Mintchev and V. Getov, “Towards Portable Message Passing in Java: Binding MPI,” in 4th European PVM/MPI Users’ Group Meeting, EuroPVM/MPI’97, Crakow, Poland, Lecture Notes in Computer Science, vol. 1332, pp. 135–142. Springer, 1997. G. Judd, M. Clement, and Q. Snell, “DOGMA: Distributed Object Group Metacomputing Architecture,” Concurrency and Computation: Practice and Experience, 10(11-13):977-983, 1998. CHEP 2010 5/3/2017 References 52 B. Pugh and J. Spacco, “MPJava: High-Performance Message Passing in Java using Java.nio,” in Proc. 16th Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC'03), Lecture Notes in Computer Science, vol. 2958, pp. 323-339, College Station, TX, USA, 2003. B. Y. Zhang, G. W. Yang, and W.-M. Zheng, “Jcluster: an Efficient Java Parallel Environment on a Large-scale Heterogeneous Cluster,” Concurrency and Computation: Practice and Experience, 18(12):1541-1557, 2006. A. Kaminsky, “Parallel Java: A Unified API for Shared Memory and Cluster Parallel Programming in 100% Java” in Proc. 9th Intl. Workshop on Java and Components for Parallelism, Distribution and Concurrency (IWJacPDC'07), pp. 196a (8 pages), Long Beach, CA, USA, 2007. M. Baker, B. Carpenter, G. Fox, S. Ko, and S. Lim, “mpi-Java: an Object-Oriented Java Interface to MPI,” in 1st Int. Workshop on Java for Parallel and Distributed Computing (IPPS/SPDP 1999 Workshop), San Juan, Puerto Rico, Lecture Notes in Computer Science, volume 1586, pp. 748–762. Springer, 1999. S. Genaud and C. Rattanapoka, “P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs,” Journal of Grid Computing, 5(1):27-42, 2007. A. Shafi, B. Carpenter, and M. Baker, “Nested Parallelism for Multi-core HPC Systems using Java,” Journal of Parallel and Distributed Computing, 69(6):532-545, 2009. M. Bornemann, R. V. v. Nieuwpoort, and T. Kielmann, “MPJ/Ibis: A Flexible and Efficient Message Passing Platform for Java,” in Proc. 12th European PVM/MPI Users’ Group Meeting (EuroPVM/MPI’05), Lecture Notes in Computer Science, vol. 3666, pp. 217-224, Sorrento, Italy, 2005. S. Bang and J. Ahn, “Implementation and Performance Evaluation of Socket and RMI based Java Message Passing Systems,” in Proc. 5th Intl. Conf. on Software Engineering Research, Management and Applications (SERA'07), pp. 153159, Busan, Korea, 2007. G. L. Taboada, J. Tourino, and R. Doallo, “F-MPJ: scalable Java message-passing communications on parallel systems,” Journal of Supercomputing, 2009. V. Getov , G. v. Laszewski , M. Philippsen , and I. Foster, “Multiparadigm communications in Java for grid computing,” Communications of the ACM, Volume. 44, Issue 10, 2001, pp. 118-125. H. Schildt, “Java Beginner’s Guide”, 4th ed., Tata McGraw-Hill, 2007. D. Balkanski , M. Trams , and W. Rehm , “Heterogeneous Computing With MPICH/Madeleine and PACX MPI: a Critical Comparison” Technische Universit, Chemnitz, 2003. S. Asghar, M. Hafeez, U. A. Malik, A. Rehman, and N. Riaz, “A-JUMP, A Universal Parallel Framework,” unpublished. MPI: A Message Passing Interface Standard. Message Passing Interface Forum [Online]. Available: http://www.mpiforum.org/docs/mpi-11-html/mpi-report.html. G. L. Taboada, J. Tourino, and R. Doallo, “Java for High Performance Computing: Assessment of current research & practice,” in Proc. 7th International Conference on Principles and Practice or Programming in Java, Calgary, Alberta, Canada, 2009. pp. 30-39. CHEP 2010 5/3/2017 References 53 G.L. Taboada, J. Tourino, and R. Doallo, “Performance Analysis of Java Message-Passing Libraries on Fast Ethernet, Myrinet and SCI Clusters,” in Proc. 5th IEEE International Conference on Cluster Computing (Cluster'03), Hong Kong, China, 2003, pp. 118-126. S. Asghar, M. Hafeez, U. A. Malik, A. Rehman, and N. Riaz, “Survey of MPI Implementations delimited by Java for HPC,” unpublished. R. V. v. Nieuwpoort et al., “Ibis: an Efficient Java based Grid Programming Environment,” Concurrency and Computation: Practice and Experience, 17(7-8):1079-1107, 2005. W. Gropp, E. Lusk, N. Doss, and A. Skjellum, “High-performance, portable implementation of the MPI Message Passing Interface Standard. Parallel Computing,” 22(6):789.828, Sept. 1996. MPICH2 [Online]. Available: http://www.mcs.anl.gov/research/projects/mpich2. MPICH-G2 [Online]. Available: http://www3.niu.edu/mpi. N. Karonis, B. Toonen, and I. Foster, “MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface,” Journal of Parallel and Distributed Computing, 2003. T. Kielmann , R. F. H. Hofman , H. E. Bal , A. Plaat , and R. A. F. Bhoedjang, “MagPIe: MPI's collective communication operations for clustered wide area systems,” ACM SIGPLAN Notices, vol. 34, Issue-08, 1999, pp. 131-140. L. Chen, C. Wang, F. C. M. Lau, and R. K. K. Ma, “A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports”, Journal of Computer Science and Technology, vol. 18, Issue-04, 2003, pp. 505-514. V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, and S. Tuecke, “Security for Grid Services,” Twelfth International Symposium on High Performance Distributed Computing (HPDC12), IEEE Press, 2003. Globus Toolkit Homepage [Online]. Available: http://www.globus.org/toolkit. PACX-MPI [Online]. Available: http://www.hlrs.de/organization/av/amt/research/pacx-mpi Version 1.2 of MPI [Online]. Available: http://www.mpi-forum.org/docs/mpi-20-html/node28.htm. O. Aumage, and G. Mercier, "MPICH/Madeleine: a True Multi-Protocol MPI for High Performance Networks," vol. 1, pp.10051a, 15th International Parallel and Distributed Processing Symposium (IPDPS'01), 2001. W. Gropp and E. Lusk, “MPICH working note: The second-generation adi for MPICH implementation of MPI,” 1996. O. Aumage, “Heterogeneous multi-cluster networking with the madeleine III communication library,” in 16th IEEE International Parallel and Distributed Processing Symposium (IPDPS '02 (IPPS & SPDP)), pp 85, Washington - Brussels Tokyo, Apr 2002. O. Aumage, L. Boug´e, A. Denis, J.-F. M´ehaut, G. Mercier, R. Namyst, and L. Prylli, “A portable and efficient communication library for high-performance cluster computing,” in IEEE Intl Conf. on Cluster Computing (Cluster 2000), pp 78-87, Technische Universitt Chemnitz, Saxony, Germany, Nov 2000. ActiveMQ homepage [Online]. Available: http://activemq.apache.org. CHEP 2010 5/3/2017 54 CHEP 2010 5/3/2017