Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej M. Goscinski School of Computing and Mathematics Deakin University Joint work with Michael Hobbs. Jackie Silcock and Justin Rough 1 Overview and Aims Basic issues and solutions Cluster Execution environments GENESIS – Architecture – Parallel processing: user – Services for parallelism expectations, clusters, phases management and transparency – Parallelism management GENESIS programming – Transparency interface – Communication paradigms – Message passing – What to do? – DSM – Related systems – Primitives – Middleware – Cluster operating systems Easy to Use and Program Environment Performance Study Summary and Future Work 2 Parallel Processing: User Expectations Affordable Performance Free from creation and placement concerns Transparency Good performance Ease of Use Supercomputers for a “poor man” Unaware of location of processes Ease of Programming Choice and easy use of communication paradigm 3 Parallel Processing: Clusters Clusters are an ideal platform for the execution of parallel applications Many institutions (universities, banks, industries) move toward homogeneous non-dedicated clusters Advantages: – Cheap to build: commodity PCs, networks – Widely available – Idle during weekends – Low utilization during working hours Disadvantages: – Poor and difficult to use software (operating systems and runtime systems) – User unfriendly – Distribution of resources (CPUs and peripherals) 4 Parallel Processing Phases Three distinct phases: – Initialization – Execution – Termination Researchers and manufacturers mainly concentrate on execution to achieve the best performance Ease of use of parallel systems and programmer’s time are neglected Application developers are discouraged as they have to program many activities, which are of an operating system nature 5 Parallelism Management Present operating systems that manage clusters are not built to support parallel processing Reason: these operating systems do not provide services to manage parallelism Parallelism management is the management of parallel processes and computational resources – Achieve high performance – Use computational resources efficiently – Make programming and use of parallel systems easy 6 Parallelism Management Parallelism management in parallel programming tools, Distributed Shared Memory and enhanced operating system environments – has been neglected – left to the application developers Application developers must deal – not only with parallel application development – but also with the problems of initiation and control for the execution on the cluster Transparency and reliability (SSI) have been neglected – users do not see a cluster as a single powerful computer 7 Services for Parallelism Management on Clusters Services for parallelism management and transparency Establishment of a virtual machine Mapping of processes to computers Parallel processes instantiation Data (including shared) distribution Initialisation of synchronization variables Coordination of parallel processes Dynamic load balancing 8 Transparency Users should see a cluster as a single powerful computer Dimensions of parallel processing transparency – Location transparency – Process relation transparency – Execution transparency – Device transparency 9 Communication Paradigms Two communication paradigms: Message Passing (MP) Explicit communication between processes of a parallel application – Fast – Difficult to use for programmers Distributed Shared Memory (DSM) Implicit communication between processes of a parallel application through shared memory objects – Easy to use – Demonstrates reduced performance Claim: Operating environments that offer MP and DSM should be provided as a part of a cluster operating system as they manage system resources 10 What to do? Affordable Performance Parallelism management Transparency Introduce special services Ease of Use Clusters Operating systems Ease of Programming Message passing and DSM Development of cluster operating systems supporting parallel processing Services of cluster operating systems: – Distributed services for transparent communication and management of basic system resources – Services for parallelism management and transparency 11 Related Systems Message Passing Systems PVM – A set of cooperating server processes and specialized libraries that support process communication, execution and synchronization – A virtual machine must be set up by the user – Provides transparent process creation and termination MPI – Objective is to standardize and coordinate the direction of various message passing applications, tools and environments – Provides limited process management functions to support parallel processing HARNESS – Does not provide transparency – Programmers are forced to specify computers, map processes to these computers – Load imbalance is neglected 12 Related Systems DSM Systems Research concentrates mainly on improving performance Ease of use has been neglected Munin – Programmers must label different variables according to the consistency protocol they require – The initialisation stage requires the application developer to define the number of computers to be used – Programmers must create a thread on each computer, initialise shared data and create synchronization variables TreadMarks – The application developer has a substantial input into initialisation of DSM processes – Full transparency is not provided 13 Related Systems Execution Environments Improvement to PVM, MPI and DSM approach of running on top of an operating system is through the enhancement of an operating system to support parallel processing Beowulf – Exploits distributed process space to manage parallel processes – Processes can be started on remote computers after logon operation into that computer was completed successfully – It does not address resource allocation nor load balancing – Transparent process migration is not provided 14 Related Systems Execution Environments NOW – Combines specialized libraries and server processes with enhancement to the kernel – Enhancement: scheduling and communication kernel modulesGLUnix to provide network wide process, file and VM management – Parallelism management service: process initialisation on any cluster computer, support semi-transparent start of parallel processes on multiple nodes (how to select nodes?), barriers, MPI MOSIX – Provides enhanced and transparent communication and scheduling within the kernel – Employs PVM to provide parallelism support (initial placement) – Process migration transparently migrates processes – Provides dynamic load balancing and data collection – Remote communication is handled through the originating computer 15 Related Systems Summary All systems but MOSIX are based on middleware – there is no trial to develop a comprehensive operating system to support parallel processing on clusters The solutions are performance driven – little work has been done on making them programmer friendly Problems from parallel processing point of view: – Processes are created one at a time although primitives provided enable the user to create multiple processes – These systems (with the exception of MOSIX) do not provide complete transparency – Virtual machine is not set up automatically – These systems do not provide load balancing 16 Cluster Execution Environments Execution environments that support parallel processing on clusters can be developed using Middleware approach – at the application level Underware – at the kernel level 17 Middleware User process M OR User process M PVM software DSM software Operating System (Unix) Operating System (Unix) Application processes Library functions or separate software Operating system 18 Middleware - summary Middleware allows programmers – – – – – to develop parallel application (PVM, MPI) execute parallel applications on clusters (Beowulf) employ shared memory based programming (Munin) achieve good execution performance take advantage of portability Middleware – does not offer complete transparency – reduces potential execution performance (services are duplicated) – forces programmers to be involved in many time consuming and error prone activities that are of the operating system nature Conclusion: to provide parallelism management, offer transparency, make programming and use of a system easy develop the needed services at the operating system level 19 Cluster operating systems Cluster is a special kind of a distributed system Cluster operating system supporting parallel processing should – possess the features of a distributed operating system to deal with distributed resources and their management and hide distribution – exploit additional services to manage parallelism for application and offer complete transparency – provide an enhanced programming environment Three logical levels of a cluster operating system – Basic distributed operating system – Parallelism management and transparency system – Programming environment 20 Logical architecture of a cluster operating system Message Passing/PVM M Communication Services PROGRAMMING ENVIRONMENT Shared Memory Parallelism Management System DSM Services Enhanced Subset of a Distributed Operating System (Microkernel, Communication/File Management) 21 GENESIS Cluster Operating System Proof of concept Client-server model, microkernel approach and object based approach (all entities have names) All basic resources: processor, main memory, network, interprocess communication, files are managed by relevant servers IPC - Message passing services – basic communication paradigm – cornerstone of the architecture – provided by IPC Manager and local IPC component of microkernel IPC placement and relationship with other services designed to achieve high performance and transparency DSM provided by Space (memory) and IPC Managers 22 The GENESIS Architecture PVM MP Resource Discovery Global Scheduler Execution Manager DSM System IPC Manager Process Manager Space Manager Network Manager Parallel Processes DSM File/Cache Manager Migration Manager Parallelism Management System Kernel Servers RHODOS Microkernel 23 GENESIS Services for Parallelism Management and Transparency Basic services that provide parallelism management and offer transparency: Establishment of a virtual machine Process creation Process duplication Process migration Global scheduling 24 Establishment of a Virtual Machine Resource Discovery Server supports adaptive establishment of a virtual machine Resource Discovery Server – Identifies Idle and lightly loaded computers Computer resources: e.g., processor model, memory size Computational load and available memory Communication patterns for each process – Passes information to the Global Scheduling Server per Process Server Averaged over an entire cluster Virtual machine changes dynamically Some computers become overloaded or out of order Some computers become idle 25 Process Creation Requirements – Multiple process creation – to create many instances of a process on a single or over many computers – Scalability – must be scalable to many computers – Complete transparency – must hide the location of all resources and processes Three forms of process creation: Single Multiple Group Creation is invoked when the Execution Manager receives a process create request from a parent process – Execution Manager notifies Global Scheduler – Global Scheduler sends location on which process should be created – Execution Manager on selected computer manages process creation 26 Process Creation Single and Multiple Services Single process creation service – Similar to the services found in traditional systems supporting parallel processing – Requires executable image to be downloaded from disk for each parallel process to be created Multiple process creation service – Supports the concurrent instantiation of a number of processes on a given computer through one creation call – When many computers are involved in multiple process creation, each computer is addressed in a sequential manner – Executable image of a parallel child process must be downloaded separately for each computer involved – scalability problem 27 Process Creation Group Group process creation combines multiple process creation and group communication Group process creation service – allows multiple process to be created concurrently on many computers – Single executable is downloaded from a file server using group communication 28 Group Process Creation Behavior 4 Global Scheduler 2 File Server 3 5 Child 2 6 Computer 2 4 Exec Manager 1 7 Exec Manager 8 9 Parent 4 Child 1 7 Exec Manager 5 Computer 1 Child n Computer n 29 Process Duplication Single Local and Remote Parallel processes are instantiated on selected computers by employing process duplication supported by process migration Three forms of process duplication Single local and remote Multiple local and remote Group remote Single local and remote process duplication – Duplication is invoked when the Execution Manager receives a twin request from a parent process Execution Manager notifies Global Scheduler Global Scheduler sends a location on which twin should be placed If this computer is remote process migration is employed 30 Process Duplication Multiple Local and Remote Multiple local and remote process duplication is an enhancement of single process duplication Duplication is invoked when the Execution Manager receives a multiple duplication request from a parent process – Execution Manager notifies Global Scheduler – Global Scheduler sends a location on which twin should be placed – If computer is local Process Manager and Space Manager are requested to duplicate multiple copies of process entries and memory spaces – If computer is remote the parent process is migrated to this destination multiple copies of the parent process are duplicated the parent process on the remote computer is killed Child processes should be duplicated on many computers – Remote process duplication is performed for each selected computer 31 Process Duplication Group Remote When more than one remote computer is involved in process duplication the overall performance decreases Decrease is caused by migrating a parent process to each remote computer sequentially Performance is improved by employing group process migration – Process Managers and Execution Managers each join a relevant group and use group communication – The parent process is concurrently migrated to all selected remote computers involved in process duplication 32 Group Remote Process Duplication Behavior 7 Exec Manager Global Scheduler 2 Exec Manager 8 8 5M 3 6 1 4 10 Child 1 9 Migration Manager Parent 7 Migration Manager Child 2 Computer 2 5 7 Parent Computer 1 5 5M Exec Manager Parent Migration Manager 5 8 Child n Computer n 33 Process Migration Designed to separate policy from mechanism – Process Migration Manager acts as the coordinator for migration of various resources that combine to form a process – Migration of resources: memory, process entries, buffers is carried out by the Space, Process and IPC Managers, respectively Two forms of process migration: single and group Single process migration – Global Scheduler provides “which” process to “where” computer – Local Manager requests its remote peer to prepare for a process – Local Migration Manager requests Space, Process and IPC Managers to migrate respective resources – Remote Manager informs its local peer of successful migration – Local Manager requests Space, Process and IPC Managers to delete the respective resources of the migrated process 34 Process Migration Behavior Event Global Scheduler 1 Migration Manager 3 7 Process 2 6 Migration Manager 5 Process Manager 4 Process State Space Manager 4 IPC Manager 4 Source Computer Spaces IPC Buffers Process Manager Space Manager Process IPC Manager Destination Computer 35 Group Process Migration Enhancement of the single process migration Modifying the single communication between the peer Migration Managers, Process Managers, Space Managers and IPC Managers to that of group communication Global Scheduler provides “which” process to “where” computers – Each server migrates their respective resources to multiple destination computers in a single message using group communication – Parent process is duplicated on each remote computer – At the end of successful migration the parent process on each remote computer is killed 36 Global Scheduling Makes policy decisions of which processes should be mapped to which computers Input provided by the Resource Discovery Manager Relies on mechanisms of – Single, multiple an group process creation and duplication services – Single and group process migration The server combines services of – Static allocation – at the initial stage of parallel processing – Dynamic load balancing – to react to load fluctuations Currently, the Global Scheduler is implemented as a centralized server 37 GENESIS Programming Interface Designed and developed to provide both communication paradigms: – Message passing – Shared memory Message Passing/PVM M Communication Services PROGRAMMING ENVIRONMENT Shared Memory Parallelism Management System DSM Services Enhanced Subset of a Distributed Operating System (Microkernel, Communication/File Management) 38 Message Passing Basic Message Passing – Exploits basic interprocess communication concepts – Transparent and reliable local and remote IPC – Integral component of GENESIS – Offers standard message passing and RPC primitives GENESIS PVM – – – – – PVM added to provide a well known parallelism programming tool Ported from the UNIX based PVM Implemented within a ‘library’ in GENESIS Mapping of the standard PVM services onto the GENESIS services Performance improvement of PVM on GENESIS No additional “classic” PVM server processes required Direct interprocess communication model instead of the default model Load balancing provided 39 Architecture of PVM on Unix User Task 1 libpvm User Task 2 libpvm TCP Connections PVM Server Kernel Computer 1 UDP Datagrams PVM Server Kernel Computer 2 40 Architecture of PVM on GENESIS Computer 1 User PVM Parallel Processes PVM Comms libpvm Execution Manager Computer 2 User PVM Parallel Processes libpvm Migration Manager Execution Manager Migration Manager IPC Manager Network Manager Global Scheduler IPC Manager Network Manager Microkernel Microkernel Network 41 Distributed Shared Memory DSM is an integral component of the operating system Since DSM is a memory management function the DSM system is integrated into the Space Manager – Shared memory used as though it were physically shared – Easy to use shared memory – Low overhead, improved performance Two consistency models supported: – Sequential – implemented using invalidation model – Release – implemented using write-update model Synchronization and coordination of processes – Semaphores - owned by Space Manager on particular computer – Gaining ownership is distributed and mutually exclusive – Barriers used for coordination – their management is centralized 42 Distributed Shared Memory Computer 1 User DSM Parallel Processes Shared Memory Space Manager DSM IPC Manager Computer 2 User DSM Parallel Processes DSM Space Manager Process Manager IPC Manager Microkernel Process Manager Microkernel Network 43 GENESIS Primitives Execution Two groups of primitives – to support execution services – for the provision of communication and coordination services MP PVM proc-ncreate() pvm_spawn() proc-ncreate() pvm_exit() proc-exit() proc-exit() DSM 44 GENESIS Primitives Communication and Coordination MP PVM DSM send() pvm_send() read access recv() pvm_recv() write access barrier() pvm_pkbuf() wait() pvm_unpkbuf() signal() pvm_barrier() barrier() 45 Easy to Use and Program Environment GENESIS system Provides and efficient and transparent environment for execution of parallel applications Offers transparency Relieves programmers from activities such as: – Selection of computers for a virtual a machine for the given application – Setting up a virtual machine – Mapping processes to virtual machine – Process instantiation using process creation and duplication supported by process migration – Load balancing 46 Easy to Use and Program Environment In the GENESIS system Location of the remote computer(s) of the cluster is selected automatically by Global Scheduler Users do not know process location Programming of parallel applications has been made easy by providing – Message passing: standard and PVM – Distributed Shared Memory – Powerful primitives: implement sequences of operations and provide transparency process_ncreate(GROUP_CREATE,n, “child_prog”) – Process instantiation using process creation and duplication supported by process migration – Load balancing 47 Performance of Standard Parallel Applications GENESIS System – 13 Sun3/50 Workstations 12 Computation + 1 File Server – 10 Mbit/sec shared Ethernet Influence of process instantiation on execution performance GENESIS PVM vs. Unix PVM Standard parallel applications – Successive Over Relaxation – Quicksort – Traveling Salesman Problem 48 Influence of Process Instantiation on Execution Performance Parallel Simulation (5, 25, 50 Second Workload) Speed-up Simulation - amount of work relates to the overall exec time Two parameters: – Work load (5, 25, 50 Seconds) – Number of workstations (1 ..12) Global scheduler & migration Speedups for #comp = #proc Speed-up 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ideal Group Multi Single 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations Ideal Group Multi Single 0 Speed-up 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ideal Group Multi Single 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations 49 GENESIS PVM vs. Unix PVM IPC Latency Round-trip time (including data packing and unpacking) was measured 1600 1400 1200 1000 Genesis PVM 800 Unix PVM (Default Route, No Encoding) 600 400 200 96 91 86 81 76 71 66 61 56 51 46 41 36 31 26 21 16 11 0 6 1 Support for IPC provided by the PVM server in Unix was substituted with GENESIS operating system mechanisms To measure the time saved by removing the server, a simple PVM application that exchanges messages (1kbyte –100kbytes) was used Round Trip TIme (ms) Message Size (KBytes) 50 GENESIS PVM vs. Unix PVM Speedup Application used to study the influence of process instantiation amount of work relates to the overall exec time – was studied Parameters: – Number of workstations – GENESIS with and without load balancing 11 9 Maximum Speedup Achieved Optimal 7 Genesis PVM w ith Load Balancing Genesis PVM w ithout Load Balancing Unix PVM 5 3 1 1 2 4 6 Num ber of Workstations 8 10 12 51 Successive Over Relaxation Parallel applications developed based on algorithms of Rice University Rice superior cluster hardware: DEC station-5000/240 + fast ATM net For 8 computers – array size: Rice - 512 x 2048 elements with 101 iterations; GENESIS 128 x 128 elements with 10 iterations – DSM: TreadMarks – 6.3; GENESIS – 4.4 – PVM: Rice – 6.91; GENESIS – 5.1 Speed-up 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ideal MP PVM DSM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations 52 Quicksort Parallel applications developed based on algorithms of Rice Rice superior cluster hardware: DEC station-5000/240 + fast ATM net For 8 computers – array size: Rice - 256 x 1024 integers; GENESIS 256 x 256 integers – DSM: TreadMarks – 5.3; GENESIS – 2.5 – PVM: Rice – 6.79; GENESIS – 6.07 Speed-up 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ideal MP PVM DSM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations 53 Traveling Salesman Problem Parallel applications developed based on algorithms of Rice University Rice superior cluster hardware: DEC station-5000/240 + fast ATM net For 8 computers – 18 city tour; with the minimum threshold set to 13 cities – DSM: TreadMarks – 4.74; GENESIS – 6.33 – PVM: Rice – 5.63; GENESIS – 5.94 Speed-up 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Ideal MP PVM DSM 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of Workstations 54 Summary Nondedicated clusters are commonly available – Force application developers to program operating system operations – Do not offer transparency Application developers need a computer system that – Processes applications efficiently – Uses cluster resources well – Allows to see cluster as a single powerful computer rather than as a set of connected computers Proposal: employ a cluster operating system Design: cluster operating system with three logical levels – Distributed operating system – Parallelism management and transparency system – Programming environment 55 Summary GENESIS – designed and developed as a “proof of concept” GENESIS is a system that satisfies user requirements GENESIS approach is unique – Offers both message passing (MP and PVM) and DSM environment – Services providing parallelism management are integral components of an operating system – Provides a comprehensive environment to transparently manage system resources Programmers do not have to be involved in parallelism management Use of the cluster is has been made easy Complete transparency is offered Good performance results have been achieved 56 Future Work Port GENESIS to an Intel like platform Use virtual memory to support DSM Offer reliable parallel computing services on clusters by employing – Reliable group communication – Checkpointing to offer fault tolerance 57