* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download - The CLOUDS Lab
Survey
Document related concepts
Transcript
Single System Image and Cluster Middleware Approaches, Infrastructure and Technologies Dr. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia www.cloudbus.org 1 Recap: Cluster Computer Architecture Parallel Applications Parallel Applications Parallel Applications Sequential Applications Sequential Applications Sequential Applications Parallel Programming Environment Cluster Middleware (Single System Image and Availability Infrastructure) PC/Workstation PC/Workstation PC/Workstation PC/Workstation Communications Communications Communications Communications Software Software Software Software Network Interface Hardware Network Interface Hardware Network Interface Hardware Network Interface Hardware Cluster Interconnection Network/Switch 2 Recap: Major issues in Cluster design • Enhanced Performance (performance @ low cost) • Enhanced Availability (failure management) • Single System Image (look-and-feel of one system) • Size Scalability (physical & application) • Fast Communication (networks & protocols) • Load Balancing (CPU, Net, Memory, Disk) • Security and Encryption (clusters of clusters) • Distributed Environment (Social issues) • Manageability (admin. And control) • Programmability (simple API if required) • Applicability (cluster-aware and non-aware app.) 3 A typical Cluster Computing Environment Applications PVM / MPI/ RSH ??? Hardware/OS 4 The missing link is provided by cluster middleware/underware Applications PVM PVM//MPI/ MPI/RSH RSH Middleware Hardware/OS 5 Middleware Design Goals Complete Transparency (Manageability): Offer a single system view of a cluster system.. Scalable Performance: Easy growth of cluster Single entry point, ftp, telnet, software loading... no change of API & automatic load distribution. Enhanced Availability: Automatic Recovery from failures Employ checkpointing & fault tolerant technologies Handle consistency of data when replicated.. 6 What is Single System Image (SSI)? SSI is the illusion, created by software or hardware, that presents a collection of computing resources as one, more whole resource. In other words, it the property of a system that hides the heterogeneous and distributed nature of the available resources and presents them to users and applications as a single unified computing resource. SSI makes the cluster appear like a single machine to the user, to applications, and to the network. 7 Cluster Middleware & SSI SSI Supported by a middleware layer that resides between the OS and user-level environment Middleware consists of essentially 2 sub-layers of SW infrastructure SSI infrastructure Glue together OSs on all nodes to offer unified access to system resources System availability infrastructure Enable cluster services such as checkpointing, automatic failover, recovery from failure, & fault-tolerant support among all nodes of the cluster 8 Functional Relationship Among Middleware SSI Modules 9 Benefits of SSI Use of system resources transparent. Transparent process migration and load balancing across nodes. Improved reliability and higher availability. Improved system response time and performance Simplified system management. Reduction in the risk of operator errors. No need to be aware of the underlying system architecture to use these machines effectively. 10 Desired SSI Services/Functions Single Entry Point: telnet cluster.my_institute.edu telnet node1.cluster. institute.edu Single User Interface: using the cluster through a single GUI window and it should provide a look and feel of managing a single resources (e.g., PARMON). Single File Hierarchy: /Proc, NFS, xFS, AFS, etc. Single Control Point: Management GUI Single Virtual Networking Single Memory Space - Network RAM/DSM Single Job Management: Glunix, SGE, LSF 11 Availability Support Functions Single I/O Space: Single Process Space: Any node can access any peripheral or disk devices without the knowledge of physical location. Any process on any node create process with cluster wide process wide and they communicate through signal, pipes, etc, as if they are one a single node. Single Global Job Management System Checkpointing and process migration: Can saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. RMS Load balancing... 12 SSI Levels SSI levels of abstractions: Application and Subsystem Level Operating System Kernel Level Hardware Level 13 SSI Characteristics Every SSI has a boundary. Single system support can exist at different levels within a system, one able to be build on another. 14 SSI Boundaries Batch System SSI Boundary Source: In search of clusters 15 SSI Middleware Implementation: Layered approach 16 SSI at Application and Sub-system Levels Level Examples Application batch system and system management; Google Search Engine Sub-system File system Toolkit Distributed DB (e.g., Oracle 10g), OSF DME, Lotus Notes, MPI, PVM Sun NFS, OSF, DFS, NetWare, and so on OSF DCE, Sun ONC+, Apollo Domain Boundary Importance An application What a user wants A sub-system SSI for all applications of the sub-system Shared portion of the file system Implicitly supports many applications and subsystems Explicit toolkit facilities: user, service name, time Best level of support for heterogeneous system © Pfister, In search of clusters 17 SSI at OS Kernel Level Level Kernel/ OS Layer Kernel interfaces Virtual memory Microkernel Examples Boundary Importance Each name space: Kernel support for Solaris MC, Unixware MOSIX, Sprite, Amoeba files, processes, applications, adm pipes, devices, etc. subsystems /GLunix UNIX (Sun) vnode, Locus (IBM) vproc None supporting OS kernel Mach, PARAS, Chorus, OSF/1AD, Amoeba Type of kernel objects: files, processes, etc. Modularizes SSI code within kernel Each distributed virtual memory space May simplify implementation of kernel objects Each service outside the microkernel Implicit SSI for all system services © Pfister, In search of clusters 18 SSI at Hardware Level Level Examples Boundary Importance Application and Subsystem Level Operating System Kernel Level memory memory device and I/O SCI (Scalable Coherent Interface), Stanford DASH SCI, SMP techniques memory space better communication and synchronization memory and I/O device space lower overhead cluster I/O © Pfister, In search of clusters 19 SSI via OS path! 1. Build as a layer on top of the existing OS Benefits: makes the system quickly portable, tracks vendor software upgrades, and reduces development time. i.e. new systems can be built quickly by mapping new services onto the functionality provided by the layer beneath. e.g.: Glunix. 2. Build SSI at kernel level, True Cluster OS Good, but Can’t leverage of OS improvements by vendor. E.g. Unixware, Solaris-MC, and MOSIX. 20 SSI Systems & Tools OS level: Subsystem level: SCO NSC UnixWare; Solaris-MC; MOSIX, …. PVM/MPI, TreadMarks (DSM), Glunix, Condor, SGE, Nimrod, PBS, .., Aneka Application level: PARMON, Parallel Oracle, Google, ... 21 UnixWare: NonStop Cluster (NSC) OS http://www.sco.com/products/clustering/ UP or SMP node UP or SMP node Users, applications, and systems management Standard OS kernel calls Standard SCO UnixWare with clustering hooks Extensions Users, applications, and systems management Extensions Modular kernel extensions Standard OS kernel calls Standard SCO UnixWare with clustering hooks Modular kernel extensions Devices Devices ServerNet Other nodes How does NonStop Clusters Work? Modular Extensions and Hooks to Provide: Single Clusterwide Filesystem view; Transparent Clusterwide device access; Transparent swap space sharing; Transparent Clusterwide IPC; High Performance Internode Communications; Transparent Clusterwide Processes, migration,etc.; Node down cleanup and resource failover; Transparent Clusterwide parallel TCP/IP networking; Application Availability; Clusterwide Membership and Cluster timesync; Cluster System Administration; Load Leveling. Sun Solaris MC (Multi-Computers) Solaris MC: A High Performance Operating System for Clusters A distributed OS for a multicomputer, a cluster of computing nodes connected by a high-speed interconnect Provide a single system image, making the cluster appear like a single machine to the user, to applications, and the the network Built as a globalization layer on top of the existing Solaris kernel Interesting features extends existing Solaris OS preserves the existing Solaris ABI/API compliance provides support for high availability uses C++, IDL, CORBA in the kernel leverages Spring OS technology 24 Solaris-MC: Solaris for MultiComputers Applications System call interface Network File system C++ Processes Solaris MC Object framework Object invocations Existing Solaris 2.5 kernel Other nodes global file system globalized process management globalized networking and I/O Kernel Solaris MC Architecture http://research.sun.com/techrep/1995/abstract-48.html 25 Solaris MC components Applications System call interface Network File system C++ Processes Solaris MC Other nodes Object framework Object invocations Existing Solaris 2.5 kernel Kernel Solaris MC Architecture Object and communication support High availability support PXFS global distributed file system Process management Networking 26 MOSIX: Multicomputer OS for UNIX http://www.mosix.cs.huji.ac.il/ || mosix.org An OS module (layer) that provides the applications with the illusion of working on a single system. Remote operations are performed like local operations. Transparent to the application - user interface unchanged. Application PVM / MPI / RSH Hardware/OS 27 Key Features of MOSIX Preemptive process migration that can migrate any process, anywhere, anytime Supervised by distributed algorithms that respond online to global resource availability – transparently. Load-balancing - migrate process from over-loaded to underloaded nodes. Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping. Download MOSIX: http://www.mosix.cs.huji.ac.il/ 28 SSI at Subsystem Level Resource Management and Scheduling 29 Resource Management and Scheduling (RMS) RMS system is responsible for distributing applications among cluster nodes. It enables the effective and efficient utilization of the resources available Software components Resource manager Resource scheduler Queuing applications, resource location and assignment. It instructs resource manager what to do when (policy) Reasons for using RMS Locating and allocating computational resource, authentication, process creation and migration Provide an increased, and reliable, throughput of user applications on the systems Load balancing Utilizing spare CPU cycles Providing fault tolerant systems Manage access to powerful system, etc Basic architecture of RMS: client-server system 30 Cluster RMS Architecture User Population Manager Node Computation Nodes Resource Manager Computation Node 1 execution results User 1 execution results Job Manager job : : job Node Status Monitor : : : : User u Job Scheduler Computation Node c 31 Services provided by RMS Process Migration Checkpointing Scavenging Idle Cycles Computational resource has become too heavily loaded Fault tolerant concern 70% to 90% of the time most workstations are idle Fault Tolerance Minimization of Impact on Users Load Balancing Multiple Application Queues 32 Some Popular Resource Management Systems Project Commercial Systems - URL LSF SGE http://www.platform.com/ NQE http://www.cray.com/ LL PBS http://www.ibm.com/systems/clusters/software/loadleveler/ http://en.wikipedia.org/wiki/Oracle_Grid_Engine http://www.pbsworks.com/ Public Domain System - URL Alchemi Condor http://www.alchemi.net - desktop grids GNQS http://www.gnqs.org/ http://www.cs.wisc.edu/condor/ 33 Pros and Cons of SSI Approaches Hardware: Operating System Offers full SSI, but expensive to develop and maintain due to limited market share. It cannot be developed partially, to benefit full functionality need to be developed, so it can be risky. E.g., Mosix and SolarisMC Subsystem Level Offer the highest level of transparency, but it has rigid architecture – not flexible while extending or enhancing the system. Easy to implement at benefit class of applications for which it is designed. E.g., Job management systems such as PBS and SGE. Application Level Easy to realise, but requires that each application developed as SSI-aware separately. E.g., Google 34 Additional References R. Buyya, T. Cortes, and H. Jin, Single System Image, International Journal of High-Performance Computing Applications (IJHPCA), Volume 15, No. 2, Summer 2001. G. Pfister, In Search of Clusters, Prentice Hall, USA. B. Walker, Open SSI Linux Cluster Project: http://openssi.org/ssi-intro.pdf 35