Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Virtualization in Linux Atul Bansal Manish Pal Pulkit Gambhir Virtualization in a nut-shell Virtualization : Running multiple machines on a single hardware “Real” hardware invisible to OS OS only sees an abstracted out picture Only Virtual Machine Monitor (VMM) talks to hardware More formally … A framework for dividing the resources of a machine into multiple execution environments by using techniques such as : 1. H/W & S/W partitioning 2. Time-sharing 3. Partial or complete machine simulation 4. Emulation … In general, we implement an M-N mapping (M:real resources N:virtual resources). Eg. Multitasking (1-N), Cluster Computing (M-1) Motivations Save on costs : Many servers 1 Machine Running legacy software Improve on security : Protection of data by placing on separate virtual machines Development and debugging platform More motivations Hardware independence of code (Java VM) Compatibility issues tackled Server migration is eased Error and attack containment Dynamic resource sharing It looks cool too…. Issues in virtualization Some interfaces not designed with virtualization in mind (Ex. Processor privileges) VMM needs to call all privileged instructions Need for extra level of segmentation of memory (between virtual machines) Done entirely by VMM, guest OS’s only see an abstraction of page table, not the real one Issues in virtualization Resource sharing Map all VM requests to same network card, same DMA controller etc. Design and management of communication between different virtual machines Need to show abstracted hardware which has no physical equivalent (emulation) And despite all that …. Need transparency and near real machine like performance ! An extremely hard task (bochs is a very good example of less than perfect performance) Example : VMWare architecture Case Study : XEN Virtual Architecture Basic virtual Architecture of Xen CPU state Exception Interrupt handling Time Memory Devices CPU State Xen provides each guest OS with virtual cpu’s (only 1 real cpu). All privileged cpu state are handled by Xen. Guest OSes are not permitted to perform privileged operations. hypercalls interface provided to guest OS to execute privileged operations on the cpu through Xen Hypercalls Analogous to system calls provided by any OS except that handlers of software interrupts vectors to entry point within Xen. Even to set up Interrupt Vector Table, the OS must invoke Xen hypercalls. Basically any priviliged operation on CPU is performed through a hypercall to Xen. Virtual IDT A virtual IDT is provided to guest OS for setting up interrupt vector table. A guest OS can submit a table of trap handlers to Xen via the set_trap_table hypercall. The exception stack frame presented to a virtual trap handler is identical to its native equivalent. Interrupt Handling Interrupts are virtualized by mapping them to event channels Get delivered to the guest OS using a callback supplied via the set callbacks hypercall. Guest OS can map these events onto its standard interrupt dispatch mechanisms. Xen is responsible for determining the guest OS that will handle each physical interrupt source. Time Time is important in virtualization as guest OS need to be aware of ‘real time’ and ‘virtual time’ (time of execution). Xen exports timestamps for system time and wall-clock time to guest operating systems through a shared page of memory. Time Consistency All time stamps need to be updated and read atomically . Xen stores a version number in the shared info page, which is incremented before and after updating the timestamps. A guest can be sure that it read a consistent state by checking the two version numbers are equal. Event Channels Event channels are the basic primitive provided by Xen for event notifications. Xen equivalent of a hardware interrupt. Stores one bit of information, the event of interest is signaled by transitioning this bit from 0 to 1. Notifications are received by a guest via an up-call from Xen, Event Channels (Implementation) The kernel shared info page (shared_info_t) contains two bitfields for event channels unsigned long evtchn_pending[…..]; unsigned long evtchn_mask[…..]; These two specify, respectively, if there is an event pending (evtchn_pending) and if the event channel is masked or not. For masked channels, no events will be delivered. Virtual CPU Setup Any guest OS needs to setup a virtual CPU on which it executes. Includes installing vector table on virtual IDT for handling interrupts,page faults etc Guest OS must setup a pair of hypervisor callbacks (notification and entry points for XEN) Hypercalls for CPU Setup set callbacks(………………………..). The above hypercall allows a guest OS to setup the hypervisor callbacks. set trap table(trap info t *table) The above hypercall allows a guest OS to setup its IDT. A further hypercall is provided for the management of virtual CPUs: vcpu op(……..) This hypercall can be used to bootstrap VCPUs, to bring them up and down and to test their current status. Start of Day The start-of-day environment for guest operating systems is different to that provided by the underlying hardware. Processor is already executing in protected mode with paging enabled. Domain 0 is created and booted by Xen itself. Start of Day For all domains other that dom0 , the analogue of the boot-loader is the domain builder. Domain builder is a user-space software running in domain 0. The domain builder is responsible for building the initial page tables for a domain and loading its kernel image at the appropriate virtual address. XEN Scheduling Similiar to traditional Linux schedulers that divide CPU time for userland processes, XEN schedules resources between VMs. It is like context switching between kernels Xen includes kernel boot time options for scheduling. Scheduling Algorithms Atropos soft real time scheduler guarantees absolute CPU shares Round Robin Characterized by a “quantum” of time Borrowed Virtual Time Proportional fair shares of CPU times “Penalizes” domains that block often ctx_allow : like the “quantum” above Scheduling Algorithms sEDF Provides weighted CPU sharing Uses real time algorithms to ensure time guarantees Uses weights as well as slices and periods for scheduling and sharing System Calls and Scheduling Some Scheduling System Calls *nice( ) getpriority( ) setpriority( ) sched_getscheduler( ) sched_setscheduler( ) sched_getparam( ) sched_setparam( ) sched_yield( ) sched_get_ priority_min( ) sched_get_ priority_max( ) sched_rr_get_interval( ) Memory management Xen allocates physical memory to the domains on a page granularity Domains may receive non-contiguous physical memory. So xen makes a distinction between machine memory and pseudo-physical memory. Machine memory refers to the entire amount of memory installed in the machine. Pseudo-physical memory, on the other hand, is a per-domain abstraction. Memory management Xen maintains a globally readable machineto-physical table Each domain is also supplied with a physicalto-machine table which performs the inverse mapping. Architecture dependent code in guest operating systems can then use the two tables to provide the abstraction of pseudo-physical memory. Page Table Updates Read-only access given to page tables Guest OS must explicitly request any modifications (through hypercalls). Xen validates all such requests and only applies updates that it deems safe This is necessary to prevent domains from adding arbitrary mappings to their page tables. Writable Page Tables Guest OS’s may request writable page tables as well. Xen must still validate modifications to ensure secure partitioning. Xen thus traps write attempts to certain memory pages. Handling the trap Xen temporarily allows write access to that page while at the same time disconnecting it from the page table that is currently in use. The newly-updated entries cannot be used by the MMU until Xen revalidates and reconnects the page. Reconnection occurs automatically later in a number of situations. e.g when the domain is preempted. Shadow Page Tables Another type of page table Guest OS uses a independent copy of page tables Unknown to the hardware Xen propagates changes made to the guest's tables to the real ones, and vice versa. VM assists Xen provides a number of “assists” for guest memory management . Hypercall used: vm assist(unsigned int cmd, unsigned int type); cmd parameter describes the action to be taken type parameter describes the kind of assist that is being referred to. Conclusions Virtualization is a very exciting area Implementation issues still exist We are still moving toward real machine like performance With hardware supported virtualization and multi-core, multi-threaded hardware; things are now looking very bright ! A quote to end it Would PhD virtualization be when several people get a PhD but only one is doing the work? : JoshTriplett on Xen IRC