Download HPCvirtual

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Copland (operating system) wikipedia , lookup

Security-focused operating system wikipedia , lookup

OS/2 wikipedia , lookup

Distributed operating system wikipedia , lookup

CP/M wikipedia , lookup

Transcript
Minesh Joshi
CSC 469 (HPC)
Dr. Box
01/30/2012
Virtualization in High-Performance Computing
Introduction
The need for the high-performance computing (HPC) on any operating system (OS)
differ from the type of server and the workloads at the workstations. From the very past, HPC
applications have been able to push the limitations of CPU to perform large problems.
Therefore, hardware virtualization has been an important factor in further developing and
advancing the power of HPC. Virtualization of HPC provides different benefits along with the
specialization of HPC operating systems still preserving the legacy compatibility.
Multiple operating systems can coexist on one physical machine with the help of
virtualization. The machine is multiplexed by a small privileged kernel, commonly referred to as
a hypervisor or virtual machine monitor (VMM), which provides the illusion of one or more real
machines. Virtualized devices and live migration of running operating systems decouple
software installations from the physical hardware configuration. Coexistence of different
versions of operating systems avoids incompatibilities, reduces testing and upgrade costs, and
eliminates issues with conflicting software packages.
Virtualization enables the specialization of operating systems with full control over
hardware resources. The hypervisor safely multiplexes the hardware resources of the physical
machine but leaves the specific hardware resource allocation to the operating system in the
virtual machine. Therefore, multiple different kinds of operating systems can coexist.
The coexistence of different operating systems in same hardware architecture is the other key
value of virtualization for HPC. Now, an HPC application can bypass legacy OS mechanisms and
algorithms can act independently and can use hardware-specific optimizations. The virtual
machines can communicate via a low-overhead and low-latency communication mechanism
provided by the hypervisor or share parts of the physical memory.
After getting background on the virtualization for HPC, the remaining part of the paper
will discuss about the benefits, limitations, and usage of the virtualization for HPC such as
security, introspection, preemption, and portability.
Productivity
Virtualization can be vital in enhancing the productivity in development and testing of
HPC applications and systems. The hypervisor can allow one VM to monitor the state,
interrupts and communications of another VM by authorization, for debugging and
performance analysis. The hypervisor can provide a virtual cluster of VMs, one for each node in
a specific configuration of an HPC application that uses a cluster programming model like MPI.
Productivity can be enhanced by using a virtual cluster, running multiple copies of the OS and
application, to achieve scaling in an application originally written for a non-scalable OS, avoiding
the rewrite for another OS.
The “virtual reboot” avoids the latencies of hardware re-initialization by the BIOS. A pre-booted
and frozen VM image can be shipped to all nodes in a cluster and significantly reduce the
startup time for the system.
Performance
The performance issues of virtualization are the cost of virtualization and the
performance benefit it offers. The most popular microprocessors such as AMD and IBM power,
all have hardware features to support virtualization and reduce its performance cost. Software
pre-virtualization is a technique of semi-automatically annotating OS code so that it can be
adapted for a specific hypervisor at a load time but remain compatible to real hardware. Such
technique is used in specifying memory address translations that are implemented in
virtualizing the analogous hardware.
Virtualization supports specialized OSes that are performance optimized for classes of HPC
applications. A hypervisor can then guarantee resource allocations to a VM. The resource
allocation can be a fixed percentage of CPU cycles, or a maximum latency
to interrupt
handling code for a real time VM. A concurrent running of virtual cluster allows a cluster HPC
application, while running, to communicate between nodes in real time.
Reliability and Availability
The failure in hardware or software affects only one VM due to its isolation. If the
affected VM cannot recover itself then its non-failing hardware resources can be reclaimed by
the hypervisor and used in restarting the failed VM or by other VMs. This type of fault isolation
enhances system reliability and increases the probability of completion of long running HPC
applications without any special effort by the OSes that run in the VMs.
One VM can capture the complete OS and application start of another VM, either on request or
periodically.
The checkpoint/ restart capability preemption by high-priority work, inter-node
migration of work in a cluster for load balancing, and restart from a previous checkpoint after
hardware or software failures.
Preemption allows real-time HPC, where a large number of nodes are preempted for a
brief time to compute a result needed immediately. These scenarios prosper the system
availability, require little or no effort in the OS or application and are important to HPC
applications because they prevent long running HPC application loss.
Security
The VM isolation provides a platform for building secure systems. An isolated VM can
have no unauthorized interaction with other non-hypervisor software running on the real
machine. A trusted program when loaded into an isolated VM by the hypervisor can be trusted
to be communicated. Introspection can be used to monitor the communications and state of a
VM, to verify independently its correct operation.
Software Complexity
Hypervisor-based systems can reduce the complexity of software development, testing,
distribution, and maintenance. However, this case is only true when the hypervisor is much
more stable than the usual OS, with very infrequent new versions or interface changes.
Since only the hypervisor bootstraps on real hardware, the related hardware configuration and
initialization can be done once rather than every time the OS is started. Since most devices are
virtualized, only one real driver is needed for each device type and most OSes only need to
implement generic virtual drivers that communicate with the real drivers.
The application can be packaged with the OS it was tested with, for distribution as a
single unit. OS fixes then applies only to combined packages and can’t create unintended side
effects to other software that uses the same OS but in a different package. Maintenance of the
system becomes more automated and the stability gradually leads to improved software
quality.
Questions still unanswered
With the advent of new technology and passing time, HPC has progressed in the field of
virtualization. It seems virtualization for HPC has done a better job and progressed through
every passing year and passing technology. However, some of the questions related to it are
still vague and unanswered. Once these questions are answered then new era of technology
will start in a different and much bigger scale.
Conclusion
Virtualization in HPC has a great potential in benefitting the HPC applications in a HPC
system with greater productivity, performance, reliability, availability, security, and simplicity.
Virtualization as implemented by a small hypervisor that runs below usual OS layer will always
be in a path to progress and betterment.
References
1] Mark F. Mergen, Volkmar Uhlig, Orran Krieger, Jimi Xenidis. Virtualization for
High-Performance Computing. IBM T.J Watson Research Center, Yorktown Heights, NY 10603.
2] "High-performance computing." Wikipedia. N.p., November 2008. Web. 28 Jan 2012.
<http://en.wikipedia.org/wiki/High-performance_computing>.
3] "VIRTUAL
MACHINE."
WIKIPEDIA. N.P., NOVEMBER 2008. WEB. 28 JAN 2012.
<HTTP://EN.WIKIPEDIA.ORG/WIKI/VIRTUAL_MACHINE>.
4] Hamilton, Marc. "HPC & Virtualization." Marc Hamilton's Weblog. CERN, 08 012
2008. Web. 29 Jan. 2012.