Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Programming High Performance Applications using Components Thierry Priol PARIS research group IRISA/INRIA http://www.irisa.fr/paris A joint work with A. Denis, C. Perez and A. Ribes Supported by the French ACI GRID initiative Outline High-Performance applications and code coupling The CORBA Component Model CCM in the context of HPC GridCCM: Encapsulation of SPMD parallel codes into components Runtime support for GridCCM components Conclusions & future works Some references High Performance applications Not anymore a single parallel application but several of them High performance applications are more and more complex… thanks to the increasing in performance of off-the-shelves hardware Several codes coupled together involved in the simulation of complex phenomena Virtual plane Plane wave Scattering of 1 on 2 Fluid-fluid, fluid-structure, structure-thermo, fluid-acoustic-vibration Even more complex if you consider a parameterized study for optimal design Some examples e-Science Industry Scattering of 2 on 1 Object 1 Object 2 Single-physic multiple-object Weather forecast: Sea-Ice-Ocean-Atmosphere-Biosphere Aircraft: CFD-Structural Mechanics, Electromagnetism Satellites: Optics-Thermal-Dynamics-Structural Mechanism Structural Mechanics Optics Quic kTime™ et un décompress eur TIFF (non compress é) sont requis pour v is ionner cette image. Thermal Dynamics Multiple-physics single-object Electromagnetic coupling EuroPVM/MPI 03 - Sept., 2003 Ocean-atmosphere coupling Programming High Performance Applications using Components 2 The current practice… Coupling is achieved through the use of specific code coupling tools Not just a matter of communication ! Interpolation, time management, … Examples: MpCCI, OASIS, PAWS, CALCIUM, ISAS, … MPI code #1 MPI code #2 MPI code #3 Code coupler (MpCCI, OASIS …) MPI implementation Limitations of existing code coupling tools Originally targeted to parallel machines with some on-going works to target Grid infrastructure Static coupling (at compile time): not “plug and play” Ad-hoc communication layers (MPI, sockets, shared memory segments, …) Lack of explicit coupling interfaces Lack of standardization Lack of interoperability … MPI code #n SAN cluster Code A Code B MpCCI MpCCI MPI The MpCCI coupling library EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 3 Codes are encapsulated into components Components have public interfaces Components can be coupled together or through a framework (code coupler) Components are reusable (with other frameworks) Application design is simpler through composition (but component models are often complex to master !) Some examples of component models EuroPVM/MPI 03 - Sept., 2003 In Code B Code interface Code interface In Out Out In In In In In Out Code coupler Component-based framework Out Out CCA, ICENI Standard component models Code A Out HPC component models Out Code A Component programming is well suited for code coupling In “A component is a unit of independent deployment” “Well separated from its environment and from other components” Code B Code interface Component definition by C. Szyperski Out Code interface Another approach for code coupling EJB, DCOM/.NET, OMG CCM Programming High Performance Applications using Components Out Out In In Code interface Code interface Code C Code D 4 Distributed components: OMG CCM Component interface An architecture for defining components and their interaction Interaction implemented through input/output interfaces Synchronous and asynchronous communication A packaging technology for deploying binary multi-lingual executables A runtime environment based on the notion of containers (lifecycle, security, transaction, persistent, events) Multi-languages, multi-OS, multi-ORBs, multi-vendors, … OFFERED A distributed component-oriented model Facets My Component Event sinks Receptacles Event sources REQUIRED Attributes Include a deployment model Could be deployed and run on several distributed nodes simultaneously EuroPVM/MPI 03 - Sept., 2003 Component-based application Programming High Performance Applications using Components 5 From CORBA 2.x to … Open standard for distributed object computing by the OMG Software bus, object oriented Remote invocation mechanism Hardware, operating system and programming language independence Vendor independence (interoperability) interface MatrixOperations { const long SIZE = 100; typedef double Vector[ SIZE typedef double Matrix[ SIZE void multiply( in Matrix A, out Vector C }; EuroPVM/MPI 03 - Sept., 2003 ]; ][ SIZE ]; in Vector B, ); IDL file Server IDL Compiler Client Obj. Impl. Obj. Inv. IDL Skel. IDL Stub POA Object Request Broker (ORB) GIOP ESIOP IIOP Programming High Performance Applications using Components 6 … CORBA 3.0 and CCM interface Mvop { void mvmult(some parameters); }; interface Stat { Void sendt(some parameters); } component MatrixOp { attribute long size; provides Mvop the_mvop; uses Stat the_stat; publishes StatInfo avail; consume MustIgo go; } OFFERED Component interface specified by the OMG IDL 3.0 MvopHome Component interface Facet REQUIRED Receptacle MatrixOp Event sink Event source Attribute Home MvopHome manages MatrixOp {}; EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 7 CCM Runtime Environment & Deployment Container CCM includes a runtime environment called Container Component Callback Implementation A container is a component’s only outside contact API CCM defines a Packaging and Deployment model ORB Persistency Components are packaged into a selfdescriptive package (zip) Internal Interfaces Transaction One or more implementations (multibinary) + IDL of the component + XML descriptors Notification Security Install Object Install Processor Object Processor Packages can be assembled (zip) A set of components + XML descriptor ZIP Deployment App OMG IDL Assembly File Install Object Assemblies can be deployed Processor Through a deployment tool to be defined… such as a Grid middleware… Staging Area [my_favorite_grid_middlware]run myapp.zip Install Object Processor EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components Processor 8 CCM in the context of HPC Encapsulation of parallel codes into software components Parallelism should be hidden from the application designers as much as possible when assembling components Issues: How much my parallel code has to be modified to fit the CCM model What has to be exposed outside the component ? Communication between components Components should use the available networking technologies to let component to communicate efficiently Issues: EuroPVM/MPI 03 - Sept., 2003 How to combine multiple communication middleware/runtime and to make them to run seamlessly and efficiently ? How to manage different networking technologies transparently ? Can my two components communicate using Myrinet for one run and using Ethernet for another run without any modification, recompilation,..? Programming High Performance Applications using Components 9 Encapsulating SPMD codes into CORBA Components The obvious way : adopt a master/Slave approach Advantage One SPMD process acts as the master whereas the others act as slaves The master drives the execution of the slaves through message passing Communication between the two SPMD codes go through the master process Could be used with any* CCM implementation Component B MPI Master processes MPI Master processes SPMD Proc. SPMD Proc. MPI comm. layer MPI comm. layer SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. … MPI Slave processes Drawbacks Component A ORB Lack of scalability when communicating through the ORB Need modifications to the original MPI code Code A … MPI Slave processes Code B WAN 40 Gbit/s Ethernet Switch 100 Mb/s Ethernet Switch 100 Mb/s 100 Mb/s 100 Mb/s * In theory … but practically… EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 10 Making CORBA Components parallel-aware Just to remind you : Communication between components is implemented through a remote method invocation (to a CORBA object) Constraints EuroPVM/MPI 03 - Sept., 2003 A parallel component with one SPMD process = a standard component Communication between components should use the ORB to ensure interoperability Little but preferably no modification to the CCM specification Scalable communication between components by having several data flows between components What the application designer should see… HPC Component A HPC Component B … and how it must be implemented ! // Comp. A SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. Programming High Performance Applications using Components // Comp. B SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. SPMD Proc. 11 Parallel Remote Method invocation A set of processes calling… Based on the concept of “multifunction” introduced by J-P. Banâtre et al. (1986) Parallel remote Method invocation A set of processes collectively calls another set of processes through a multi-function (multi-RPC) Notion of collection of objects (parallel objects) Multi-RMI Parallel Component EuroPVM/MPI 03 - Sept., 2003 A collection of identical components A collection of parallel interfaces (provide/use) X=1 Y=3 Call f(X,Y,Z) A=Z*2 . . . . X=3 Y=5 Call f(X,Y,Z) A=Z*2 . . . . F(A, B, C) Begin … C=A+3*B … return F(A, B, C) Begin … C=A+3*B … return … … X=7 Y=8 Call f(X,Y,Z) A=Z*2 . . . . F(A, B, C) Begin … C=A+3*B … return … another set of processes Parallel Component A Parallel Component B Comp. A-0 Comp. A-1 A-0 SPMD Comp. A-2 A-0 Proc. SPMD Comp. A-3 A-0 Proc. SPMD Comp. A-4 A-0 Proc. SPMD Proc. SPMD Proc. Programming High Performance Applications using Components Comp. B-0 Comp. B-1 SPMD Comp. B-2 Proc. SPMD Comp. B-3 Proc. SPMD Comp. B-4 Proc. SPMD Proc. SPMD Proc. 12 Parallel interface A parallel interface is mapped on a collection of identical CORBA objects Collection specification through IDL extensions Invocation to a collection of objects is transparent to the client (either sequential or parallel) Size and topology of the collection Data distribution Reduction operations Extended-IDL compiler Modification to the IDL compiler Stub/skeleton code generation to handle parallel objects New data type for distributed array parameter interface[*:2*n] MatrixOperations { typedef double Vector[SIZE]; typedef double Matrix[SIZE][SIZE]; void multiply(in dist[BLOCK][*] Matrix A, in Vector B, out dist[BLOCK] Vector C); void skal(in dist[BLOCK] Vector C, out csum double skal); }; Extension to CORBA sequence Data distribution information Impact on standard: Does not require the modification of the ORB Require extensions to IDL and naming service Is not portable across CORBA implementations EuroPVM/MPI 03 - Sept., 2003 Parallel CORBA Object MPI comm. layer Sequential client Object Inv. Stub SPMD Proc. … SPMD Proc. Skel. Skel. POA POA Object Request Broker (ORB) Programming High Performance Applications using Components 13 Parallel invocation and data distribution Parallel CORBA Object MPI comm. layer Data management is carried out by the stub and the skeleton What the stub has to do: generates requests according to the number of objects belonging to the collection According to the data distribution Sequential client SPMD Proc. Object Inv. Stub Scatter the data for the input parameters (in) Gather the data for the output parameters (out) … Skel. Skel. POA POA Object Request Broker (ORB) Skel. Stub ... Pco->foo(A) ... Programming High Performance Applications using Components Request1 Request2 Request3 SPMD Proc. foo( A ) foo( A ) foo( A ) MPI comm. layer EuroPVM/MPI 03 - Sept., 2003 A Request0 Request0 Object Request Broker (ORB) Gather interface[*] Operations { typedef double Matrix[100][100]; void foo(inout dist[BLOCK][*] Matrix A); }; SPMD Proc. foo( A ) Scatter 14 Parallel invocation and data distribution Parallel CORBA Object MPI comm. layer Different scenarios Stubs synchronized themselves to elect those No serialisation here that call the server objects But parallel connection Parallel client M x 1 : the server is sequential M x N : the server is parallel Synchronisation through MPI Data redistribution is performed by the stub before calling the parallel CORBA object Parallel server SPMD Proc. SPMD Proc. Stub Stub … SPMD Proc. Stub Object Request Broker (ORB) POA POA POA Skel. Skel. Skel. SPMD Proc. SPMD Proc. Data redistribution library using MPI … SPMD Proc. MPI comm. layer Parallel CORBA Object Parallel Server A[BLOCK][*] A[*][BLOCK] 2) Parallel Invocation ORB Redistribution for parameter A Parallel Client 1) Redistribution Parallel Server Parallel Client Data redistribution at client side EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 15 Lessons learnt… Implementation dependant Modification of the CORBA standard (Extended-IDL) The OMG does not like such approach The end-users too… Data distribution specified in the IDL Parallel CORBA object only available for MICO Difficulties to add new data distribution schemes Other approaches PARDIS (Indiana University, Prof. D. Gannon) Data Parallel CORBA (OMG) EuroPVM/MPI 03 - Sept., 2003 Modification of the IDL language (distributed sequence, futures) Require modifications to the ORB to support data parallel object Data distribution specification included in the client/server code and given to the ORB Programming High Performance Applications using Components 16 Portable parallel CORBA object No modification to the OMG standard Parallelism specification through XML Standard IDL Standard ORB Standard naming service Operation: sequential, parallel Argument: replicated, distributed Return argument: reduction or not Automatic generation of a new CORBA object (Parallel Stub and Skeleton) Renaming Add new IDL interfaces EuroPVM/MPI 03 - Sept., 2003 #include “Matrix.idl” Interface IExample { void send_data(Matrix m); }; Interface IExample Method_name: send_data Method_type: parallel Return_type: none Number_argument: 1 Type_Argument1: Distributed … User IDL Distribution (XML) Parallel IDL compiler Parallel IDL Standard IDL compiler Parallel CORBA Stub & Skel. Standard Stub Standard Skel. Programming High Performance Applications using Components 17 Portable parallel CORBA object Behave like standard CORBA objects A parallel CORBA object is: A collection of identical CORBA object A Manager object MPI WORLD MPI WORLD Parallel User Code (MPI) Interface1 Implementation Interface1 Interface1 Parallel CORBA stub Parallel CORBA skel ManagerInterface1 ManagerInterface1 CORBA stub Data distribution managed by the Parallel CORBA stub and skeleton I_var o = I::_narrow(...); o->fun(); CORBA skeleton CORBA ORB At the client or the server side Through the ORB Interoperability with various CORBA implementations EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 18 Back to CCM: GridCCM interface Interfaces1 { void example(in Matrix mat); }; … component A { provides Interfaces1 to_client; uses Interfaces2 to_server; }; Parallel Component Of type A to_client IDL Component: A Port: to_client Name: Interfaces1.example Type: Parallel Argument1: *, bloc … EuroPVM/MPI 03 - Sept., 2003 to_server Comp. A-0 Comp. A-1 A-0 SPMD Comp. A-2 A-0 Proc. SPMD Comp. A-3 A-0 SPMD Proc. Comp. A-4 A-0 SPMD Proc. Proc. SPMD Proc. XML Programming High Performance Applications using Components 19 Runtime support for a grid-aware component model Main goal for such a runtime Should support several communication runtime/middleware at the same time Underlying networking technologies not exposed to the applications Parallel runtime (MPI) & Distributed Middleware (CORBA) such as for GridCCM Should be independent from the networking interfaces Let’s take a simple example MPI and CORBA using the same protocol/network (TCP/IP, Ethernet) MPI within a GridCCM component, CORBA between GridCCM components The two libraries are linked together with the component code, does it work ? Extracted from a mailing list: Message 8368 of 11692 | Previous |Next [ Up Thread ] Message Index From: ------- -------- < -.-----@n... > Date: Mon May 27, 2002 10:04 pm Subject: [mico-devel] mico with mpich I am trying to run a Mico program in parallel using mpich. When calling CORBA::ORB_init(argc, argv) it seems to coredump. Does anyone have experience in running mico and mpich at the same time? EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 20 PadicoTM: an open integration framework Provide a framework to combine various communication middleware and runtimes For parallel programming: For distributed programming Transparency vis à vis the applications Get the maximum performance from the network! RPC/RMI based middleware (DCE, CORBA, Java) Middleware for discrete-event based simulation (HLA) Avoid the NxM syndrome ! … Std Object … Std Sim HLA Framework Middleware & runtimes Grid Services MPI CORBA … HLA PadicoTM framework Offer zero-copy mechanism to middleware/runtime What are the difficulties ? MPI Sim. CORBA Framework To handle a large number of networking interfaces Message based runtimes (MPI, PVM, …) DSM-based runtimes (TreadMarks, …) MPI code Object WAN SAN Provide a generic framework for parallel-oriented runtime (SPMDbased + fixed topology) and distributed-oriented middleware (MPMDbased + dynamic topology) Multiplexing the networking interface A single runtime/middleware use several networking interfaces at the same time Multiple runtime/middleware use simultaneously a single networking interface EuroPVM/MPI 03 - Sept., 2003 Homogeneous cluster Programming High Performance Applications using Components Supercomputer Visualisation 21 PadicoTM architecture overview Provide a set of personalities to make easy the porting of existing middleware The Internal engine controls the access to the network and scheduling of threads DSM MPI CORBA Available on a large number of networking technologies Associated with Marcel (multithreading) EuroPVM/MPI 03 - Sept., 2003 Kaffe JVM CERTI HLA Personality Layer Arbitration, multiplexing, selection, … Low level communication layer based on Madeleine Mpich PadicoTM Services PadicoTM Core Internal engine Madeleine Portability across networks Myrinet SCI Network s BSD Socket, AIO, Fast Message, Madeleine, … Mome OmniORB MICO Orbacus Orbix/E TCP Programming High Performance Applications using Components Marcel I/O aware multi-threading Multithreading 22 PadicoTM implementation SPMD process on each participating node Implemented as a single process To gain access to networks that cannot be shared Launched simultaneously on each participating node Middleware and user’s applications are loaded dynamically at runtime DSM Applications MPI Circuit CORBA High Performance Sockets NetAccess Network multiplexing Polling/callback management Madeleine Portability across networks Myrinet JVM VSock Network Topology Management SCI Network s Appear as modules Modules can be binary shared objects (“.so” libraries on Unix system) or Java classes User code TCP GK TaskManager Modules management Centralized thread policy Marcel I/O aware multi-threading <defmod name=“CORBA-omniORB-4.0” driver=“binary”> <attr label=“NameService”> corbaloc::paraski.irisa.fr:8088/NameService </attr> <requires>SysW</requires> <requires>VIO</requires> <requires>LibStdC++</requires> <unit>CORBA-omniORB-4.0.so</unit> <file>lib/libomniorb-4.A.so</file> <file>lib/libomnithread4.A.so</file> </defm=od> EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components HLA PadicoTM Core PadicoTM Services SOAP or CORBA Create Load Start Stop Unload Destroy Multithreading XML Module Descriptor 23 Performances Experimental protocols Runtimes and middleware: MPICH, CORBA OmniORB3, Kaffe JVM Hardware: Dual-Pentium III 1 Ghz with Myrinet 2000, Dolphin SCI Performance results 250 CORBA/Myrinet-2000 MPI/Myrinet-2000 200 Java/Myrinet-2000 CORBA/SCI MPI/SCI TCP/Ethernet-100 150 Myrinet-2000 Bandwidth (MB/s) MPI: 240 MB/s, 11 ms CORBA: 240 MB/s, 20 ms 18 ms 96 % of the maximum achievable bandwidth of Madeleine 100 50 SCI MPI: 75 MB/s, 23 ms CORBA: 89 MB/s, 55 ms 0 1 10 100 1000 10000 100000 1000000 1E+07 Message size (bytes) EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 24 Composition scalability GridCCM code hand written 1D block distibution Experimental Protocol Platform GridCCM implementation 16 PIII 1 Ghz, Linux 2.2 Fast-Ethernet network + Myrinet-2000 network JAVA: OpenCCM C++: MicoCCM C++/Myri based on MicoCCM/PadicoTM Comp. A-0 Comp. A-1 A-0 SPMD Comp. A-2 A-0 Proc. SPMD Comp. A-3 A-0 SPMD Comp. A-4 A-0 Proc. SPMD Proc. SPMD Proc. Proc. EuroPVM/MPI 03 - Sept., 2003 Comp. A-0 Comp. A-1 A-0 SPMD Comp. A-2 A-0 Proc. SPMD Comp. A-3 A-0 SPMD Comp. A-4 A-0 Proc. SPMD Proc. SPMD Proc. Proc. Aggregated Bandwidth in MB/s Preliminary GridCCM performance 300 250 200 150 100 50 0 1->1 2->2 4->4 8->8 Component configuration Java C++/Eth C++/Myri Programming High Performance Applications using Components 25 Conclusion & Future works Conclusions Code coupling tools can benefit from component models An existing component model such as CCM can be extended to comply with HPC requirements without modifying the standard Provide a standard to let HPC codes to be reused and shared in various contexts We do not need to reinvent the wheel… CORBA is a mature technology with several open source implementations Should be careful when saying that CORBA is not suitable for HPC It is a matter of implementation ! (zero-copy ORB exists such as OmniORB) It is a matter of a suitable runtime to support HPC networks (PadicoTM) Yes, CORBA is a bit complex but do not expect to have a simple solution to program distributed systems… Future works Deployment of components using Grid middleware & dynamic composition EuroPVM/MPI 03 - Sept., 2003 Programming High Performance Applications using Components 26 Some references The Padico project One of the best tutorial on CCM http://www.irisa.fr/paris/Padico http://www.omg.org/cgi-bin/doc?ccm/2002-04-01 A web site with a lot of useful information about CCM EuroPVM/MPI 03 - Sept., 2003 http://ditec.um.es/~dsevilla/ccm/ Programming High Performance Applications using Components 27