Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Minesh Joshi CSC 469 (HPC) Dr. Box 01/30/2012 Virtualization in High-Performance Computing Introduction The need for the high-performance computing (HPC) on any operating system (OS) differ from the type of server and the workloads at the workstations. From the very past, HPC applications have been able to push the limitations of CPU to perform large problems. Therefore, hardware virtualization has been an important factor in further developing and advancing the power of HPC. Virtualization of HPC provides different benefits along with the specialization of HPC operating systems still preserving the legacy compatibility. Multiple operating systems can coexist on one physical machine with the help of virtualization. The machine is multiplexed by a small privileged kernel, commonly referred to as a hypervisor or virtual machine monitor (VMM), which provides the illusion of one or more real machines. Virtualized devices and live migration of running operating systems decouple software installations from the physical hardware configuration. Coexistence of different versions of operating systems avoids incompatibilities, reduces testing and upgrade costs, and eliminates issues with conflicting software packages. Virtualization enables the specialization of operating systems with full control over hardware resources. The hypervisor safely multiplexes the hardware resources of the physical machine but leaves the specific hardware resource allocation to the operating system in the virtual machine. Therefore, multiple different kinds of operating systems can coexist. The coexistence of different operating systems in same hardware architecture is the other key value of virtualization for HPC. Now, an HPC application can bypass legacy OS mechanisms and algorithms can act independently and can use hardware-specific optimizations. The virtual machines can communicate via a low-overhead and low-latency communication mechanism provided by the hypervisor or share parts of the physical memory. After getting background on the virtualization for HPC, the remaining part of the paper will discuss about the benefits, limitations, and usage of the virtualization for HPC such as security, introspection, preemption, and portability. Productivity Virtualization can be vital in enhancing the productivity in development and testing of HPC applications and systems. The hypervisor can allow one VM to monitor the state, interrupts and communications of another VM by authorization, for debugging and performance analysis. The hypervisor can provide a virtual cluster of VMs, one for each node in a specific configuration of an HPC application that uses a cluster programming model like MPI. Productivity can be enhanced by using a virtual cluster, running multiple copies of the OS and application, to achieve scaling in an application originally written for a non-scalable OS, avoiding the rewrite for another OS. The “virtual reboot” avoids the latencies of hardware re-initialization by the BIOS. A pre-booted and frozen VM image can be shipped to all nodes in a cluster and significantly reduce the startup time for the system. Performance The performance issues of virtualization are the cost of virtualization and the performance benefit it offers. The most popular microprocessors such as AMD and IBM power, all have hardware features to support virtualization and reduce its performance cost. Software pre-virtualization is a technique of semi-automatically annotating OS code so that it can be adapted for a specific hypervisor at a load time but remain compatible to real hardware. Such technique is used in specifying memory address translations that are implemented in virtualizing the analogous hardware. Virtualization supports specialized OSes that are performance optimized for classes of HPC applications. A hypervisor can then guarantee resource allocations to a VM. The resource allocation can be a fixed percentage of CPU cycles, or a maximum latency to interrupt handling code for a real time VM. A concurrent running of virtual cluster allows a cluster HPC application, while running, to communicate between nodes in real time. Reliability and Availability The failure in hardware or software affects only one VM due to its isolation. If the affected VM cannot recover itself then its non-failing hardware resources can be reclaimed by the hypervisor and used in restarting the failed VM or by other VMs. This type of fault isolation enhances system reliability and increases the probability of completion of long running HPC applications without any special effort by the OSes that run in the VMs. One VM can capture the complete OS and application start of another VM, either on request or periodically. The checkpoint/ restart capability preemption by high-priority work, inter-node migration of work in a cluster for load balancing, and restart from a previous checkpoint after hardware or software failures. Preemption allows real-time HPC, where a large number of nodes are preempted for a brief time to compute a result needed immediately. These scenarios prosper the system availability, require little or no effort in the OS or application and are important to HPC applications because they prevent long running HPC application loss. Security The VM isolation provides a platform for building secure systems. An isolated VM can have no unauthorized interaction with other non-hypervisor software running on the real machine. A trusted program when loaded into an isolated VM by the hypervisor can be trusted to be communicated. Introspection can be used to monitor the communications and state of a VM, to verify independently its correct operation. Software Complexity Hypervisor-based systems can reduce the complexity of software development, testing, distribution, and maintenance. However, this case is only true when the hypervisor is much more stable than the usual OS, with very infrequent new versions or interface changes. Since only the hypervisor bootstraps on real hardware, the related hardware configuration and initialization can be done once rather than every time the OS is started. Since most devices are virtualized, only one real driver is needed for each device type and most OSes only need to implement generic virtual drivers that communicate with the real drivers. The application can be packaged with the OS it was tested with, for distribution as a single unit. OS fixes then applies only to combined packages and can’t create unintended side effects to other software that uses the same OS but in a different package. Maintenance of the system becomes more automated and the stability gradually leads to improved software quality. Questions still unanswered With the advent of new technology and passing time, HPC has progressed in the field of virtualization. It seems virtualization for HPC has done a better job and progressed through every passing year and passing technology. However, some of the questions related to it are still vague and unanswered. Once these questions are answered then new era of technology will start in a different and much bigger scale. Conclusion Virtualization in HPC has a great potential in benefitting the HPC applications in a HPC system with greater productivity, performance, reliability, availability, security, and simplicity. Virtualization as implemented by a small hypervisor that runs below usual OS layer will always be in a path to progress and betterment. References 1] Mark F. Mergen, Volkmar Uhlig, Orran Krieger, Jimi Xenidis. Virtualization for High-Performance Computing. IBM T.J Watson Research Center, Yorktown Heights, NY 10603. 2] "High-performance computing." Wikipedia. N.p., November 2008. Web. 28 Jan 2012. <http://en.wikipedia.org/wiki/High-performance_computing>. 3] "VIRTUAL MACHINE." WIKIPEDIA. N.P., NOVEMBER 2008. WEB. 28 JAN 2012. <HTTP://EN.WIKIPEDIA.ORG/WIKI/VIRTUAL_MACHINE>. 4] Hamilton, Marc. "HPC & Virtualization." Marc Hamilton's Weblog. CERN, 08 012 2008. Web. 29 Jan. 2012.