Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Virtual Machine (VM) Layered model of computation Software and hardware divided into logical layers Layer n Receives services from server layer n – 1 Provides services to client layer n + 1 Virtual Machines Layers interact through well-defined programming interface Virtual layer Software emulation of hardware or software layer n Transparent to layer n + 1 Provides service to layer n + 1 as expected from real layer n Virtual layer n can run at some layer m ≠ n in real system Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 1 Examples of Virtual Systems virtual Hardware Virtual System Real System Modern Microprocessors — Fall 2012 Virtual Machines Server OS Protocol Stack Network Client real 2 Runs above primary OS / below guest OS Provides guest OS with software emulation of real hardware system Hardware System Virtual Machine Emulation of system-level hardware environment Server Cloud computing Virtual Runs above physical hardware and below one or more OSs Service level agreement (SLA) specifies infrastructure requirements User sees hardware / software configuration / performance Application Application Application Application OS OS Guest OS Provider assembles virtual configuration Meets SLA requirements May be implemented in any way OS VMM VMM Hardware OS Hardware Basic System Virtual Machines Application VM Real Modern Microprocessors — Fall 2012 Dr. Martin Land Process Virtual Machine VM provides application interpretation above OS Hosted Virtual Machine Virtual machine monitor (VMM) Web server Local OS Protocol Stack real n + 1 virtual n = m m – 1 Types of Virtual Machine Web browser exchanges data with server Browser n + 1 n n – 1 Dr. Martin Land 3 Modern Microprocessors — Fall 2012 System VM Hardware Hardware Hosted VM Virtual Machines Process VM Dr. Martin Land 4 Process VM Example — Java Hosted VM Example — Guest OS Over OS DOS command line interface over Windows Windows allocates 1 MB virtual memory space Copies DOS kernel into low memory Designed for program portability between platforms Provides standard interface to software Java VM located above a standard OS Interface to hardware implementation dependent I/O operations performed by calls to OS Java compiled to bytecode Bytecode usually run (interpreted) in Java VM debug Windows Application System calls handled by guest DOS kernel Virtual 86 Windows DOS accesses to hardware Trapped and served by Windows host OS Responses returned to DOS Hardware Concurrent DOS windows Multiple allocations of 1 MB virtual memory spaces DEBUG Application running in virtual DOS machine Sees 1 MB memory space allocated by Windows Register values Windows emulates real values to DOS Debug emulates DOS values to user Java without VM Java bytecode processor in IBM mainframes Native machine language (ISA) is Java bytecode Execute Java bytecode without interpretation Parallels, VirtualBox, VMware, DOSBox, ... Host Windows, Linux, DOS, … as guest OSs over host OS http://java.sun.com/docs/books/tutorial/getStarted/intro/definition.html Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 5 Virtual Machine in IBM z/990 Mainframe … User User OS — LPAR … User User OS — LPAR … User User OS — LPAR … … User User Resource management Hardware redundancy High availability Recovery management … Hardware pooling User 6 App1 App2 App2 App3 OS OS OS OS VMM VMM Server Server Assemble hardware cluster Map applications to hardware efficiently OS — LPAR Load balancing Remap applications to hardware Hardware Virtual Machines Dr. Martin Land Isolate user environments on single hardware platform Multiple copies of single operating system running independently Multiple operating systems running concurrently Maintain higher security VMM — Systems Manager — Hypervisor Modern Microprocessors — Fall 2012 Virtual Machines VM as System Management Tool Hardware CPUs, I/O system, internal communication network VMM (hypervisor) Operator console for partitioning/configuring CPUs and I/O Provides hardware emulation as abstraction to OS layer OS Logical partition (LPAR) runs separate instance of operating system Run z/OS, MVS, VM, Unix, Linux, Windows, … instances in parallel Non-Windows OS versions expect to see hypervisor (not hardware) User User sees single-user interface provided by one OS User Modern Microprocessors — Fall 2012 Dr. Martin Land 7 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 8 z/990 Parallel Sysplex Model Virtualization for Server Systems Parallel Sysplex Merge 2 to 32 instances of z/OS into a single system Applications divide work and data among LPARs High capacity for very large workloads Resource sharing Dynamic workload balancing Old file server model Run one application per physical server Server specified for worst case load Large number of typically underutilized servers Huge aggregate space capacity Competition from mainframes Geographical diversity Coupled LPARs on remote physical systems Physical backup User … User User … User User … User User Automatic failure recovery LPAR - OS LPAR - OS LPAR - OS Continuous availability Systems Manager … … User User … User LPAR - OS Virtualization in server Hardware (processors, RAM, I/O) User … User User LPAR - OS Coupling Facility Modern Microprocessors — Fall 2012 … User User LPAR - OS … User User LPAR - OS … … User User … User LPAR - OS Partition hardware resources to run independent applications Intel virtualization IA-32 and IA-64 ISA support I/O chipset support Systems Manager Hardware (processors, RAM, I/O) Virtual Machines VMM provides dynamic load balancing Hardware provides centralized power, cooling, monitoring, backup High SAR — scalability, availability, reliability Lower cost per served client than server farm Dr. Martin Land 9 HP Virtual Partitions (vPars) Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 10 Dr. Martin Land 12 System VM Organization Hypervisor Virtual machine monitor (VMM) Lowest layer above physical hardware (host) Uniprocessor or multiprocessor system Creates virtual machine (VM) environments for guest OSs Allocates physical host resources to virtual resources VM overhead Processor intensive applications — low overhead Infrequent use of OS calls Most instructions run directly on hardware I/O intensive applications — high overhead Frequent use of OS calls OS calls for I/O services run in emulation Boot Order I/O-limited applications Program throughput limited by I/O latency Emulation adds relatively small overhead Hewlett-Packard, "Installing and Managing HP-UX Virtual Partitions (vPars)" Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 11 Modern Microprocessors — Fall 2012 Virtual Machines VMM Requirements Virtualization Awareness Hardware abstraction Guest environment must replicate hardware VMM must present well-defined software interface to OS Virtualization-aware guest OS OS written to run above VMM/hypervisor Expects to interact with virtual host Does not expect full or direct control of physical hardware Protection OS code interfaces with hypervisor code No need to remap (bluff) pointers intended for real hardware Isolate guests from one another Protect VMM from guest OS and application software Guest software cannot change allocation of physical resources May be presented with view of real system for limited operations Example — mainframe OS Privilege Writes I/O outputs to hypervisor interface Does not attempt to configure I/O hardware devices Particular OS may be given direct control of particular I/O device VMM runs in kernel mode Guest OSs and applications run in user mode Virtualization-unaware guest OS Hardware support for VMM OS written to run above physical hardware Expects full and direct control of real hardware Requires extensive intervention and remapping by VMM Virtualization primitives built into mainframe ISA Any OS or application access to hardware causes trap to VMM VMM catches every access to hardware abstraction layer (HAL) Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 13 Hardware Emulation Activities VMM Emulation CPU Memory Access Read data or Translate data/instruction from guest to host format instruction Remap address space Write data Read/Write to real host memory CPU I/O Device Access Read data or Translate data/instruction from guest to host format instruction Remap I/O port space Write data Read/Write to real host I/O device I/O device actions OS VMM Virtual Machines 14 CPU Translates guest ISA to host ISA Memory Translates memory size and organization Chipset Translates guest configuration instructions to host I/O devices Translates guest driver to host driver CPU emulation example Hardware Run Nintendo game on PC Translate each Nintendo instruction to IA-32 instruction set Partial system emulation Part of host hardware presented to OS unchanged VMM passes guest operations to host with minimal intervention Most system VMs emulate subset/superset of real host hardware CPU emulation only in special cases VMM manages I/O device DMA DMA or IRQ Translate OS interrupt handlers from guest format to host format Modern Microprocessors — Fall 2012 Dr. Martin Land Full system emulation VMM intervenes in every OS access to hardware Application Operation Virtual Machines Full/Partial System Emulation OS sees hardware through operations OS instructions cause to CPU initiate memory and I/O operations I/O devices initiate DMA operations and interrupts Real Hardware Modern Microprocessors — Fall 2012 Dr. Martin Land 15 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 16 Software Emulation of I/O Hardware Bootstrap Process in System VM Advantages VMM provides emulation of widely supported device hardware Guest OS runs available device drivers without modification Difficulties Requires very accurate device emulation Includes hardware revisions and "bug emulation" Performance issues VMM intervention on every guest OS access to I/O device Workstation without VMM Context switch from guest OS to VMM VMM emulates I/O access and access to real I/O device Context switch back to guest OS with response Adds considerable overhead Emulation is compute-intensive — increases CPU utilization Least-bad case Virtual device = real device Remap I/O ports — no change to driver operation Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 17 Virtualization Difficulties for IA‐32 Expect to have highest privilege Can easily discover their lower privilege Modern Microprocessors — Fall 2012 Virtual Machines System boot CPU loads initial system loader (ISL) ISL points to system boot device Boot device contains OS CPU loads initial system loader (ISL) ISL points to system boot device Boot device contains VMM Device Discovery OS loader writes to host I/O space Chipset and I/O devices respond OS loads drivers for host devices OS provides user interface VMM loader writes to host I/O space Chipset and I/O devices respond VMM loads drivers for host devices VMM provides administrator interface Secondary Boot Administrator configures VM partitions Administrator points VMM to device containing OS boot image VMM boots OS into partition Device Discovery OS loader writes to virtual I/O space VMM responds for I/O devices OS loads drivers for virtual devices OS provides user interface Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 18 Memory Resource Compression OS manages resources using IA-32 system tables Assigns pointer to page table root (directory) Manages page table entries Manages memory segmentation with descriptor tables limited to 8 K entries IA-32 designed to provide hardware support to OS Memory segmentation Virtual memory and paging Task management Interrupt management Protection and privilege for segmentation, paging, interrupts Workaround virtualization Treat OS like user application Can create a kludge on IA-32 systems IA-32 operating systems Workstation with VMM Application OS Hardware Virtual User Kernel Global descriptor table (GDT) Map segment pointer to virtual address Define segment type (code, data, system) and privilege level Interrupt descriptor table (IDT) Map interrupts and traps to service routines Application OS VMM Hardware Real Dr. Martin Land Memory compression VMM must reserve part of guest virtual memory for management OS expects to see the full virtual memory space Table resource compression VMM requires entries in GDT and IDT for management of OS VMM must prevent OS access to its descriptors OS expects full control of all 8 K table entries 19 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 20 Ring Aliasing Non‐Faulting Access to Privileged State Privilege rings Memory segments assigned privilege from 0 (highest) to 3 (lowest) Privileged registers Control configuration of hardware systems VMM must Stored in segment descriptor (table entry defining segment) Access rights for code limited to segments of same or lower privilege Copied into code segment selector (pointer to segment via descriptor) User mode ~ ring 3 OS kernel mode ~ ring 0 Ring aliasing Deprivileging Intercept OS access to privileged registers Provide virtual values based determined for guest environment Access to privileged registers in IA-32 Access by unprivileged software usually prevented Access Granted Run VMM at ring 0 and OS at ring 1 Issues Causes protection fault VMM emulates response to guest instruction Access Denied Some unprivileged accesses privileged state and do not fault CPL CPL Paging restricted to two levels 4 level privilege not supported in 64-bit systems OS can read its CPL from code segment selector DPL CPL DPL 0 1 CPL 2 3 DPL DPL Virtual Machines GDTR pointer to GDT IDTR pointer to IDT LDTR pointer to LDT TR pointer to current task segment Guest OS can determine that it does control CPU CPL — privilege level of code segment DPL — privilege level for data access or branch target Modern Microprocessors — Fall 2012 On user access to system state Protection fault on write No fault on read Dr. Martin Land 21 System Calls and Interrupts Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 22 Intel Virtualization Technology (VT) System calls Application in ring 3 invokes OS in ring 0 Require indirect mechanism (call gate) Virtual machine monitor Hardware boots (3rd party) VMM software instead of OS VMM configures hardware resources among guest systems Redirects to hidden ring 0 address VMM must emulate call gates Remaps hardware locations to virtual pointers for guests OSs boot within guest partitions SYSENTER instruction provides fast calls to ring 0 Hardware support for virtualization VT-enabled processors alternate between operating modes Will call VMM instead of guest OS SYSEXIT instruction ends SYSENTER routine Root mode grants full hardware control to VMM Non-root mode presents virtual pointers to guest OS Faults to ring 0 if executed from lower privilege VMM must emulate response to SYSENTER/SYSEXIT Interrupts Interrupts can be masked by controlling interrupt flag (IF) VMM must mask interrupts and handle interrupts by emulation Some OSs toggle IF frequently requiring many VMM interventions VT-enabled chipset Grants control of I/O to root mode Remaps I/O channels for non-root mode VMX non‐root User Ring 3 User Privilege OS Ring 0 Virtual Full Privilege Operating system VMX root VMM Sees virtual machine as real system Operates in ring 0 for maximum privilege Sends instructions to hardware pointers in usual way Real Full Privilege http://www.intel.com/technology/itj/2006/v10i3/index.htm Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 23 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 24 System Issues in Virtualization CPU virtualization support Handles operations initiated by CPU Memory access by guest software VMM assigns virtual address space to guest OS I/O access by guest software VMM translates OS driver output for host device VT‐x for IA‐32 Processor Virtualization Virtual machine extensions (VMX) VMX root operation PCI Host-to-Bus Bridge (bus controller) CPU Graphics ROM VMX non-root operation RAM Operating mode designed for guest OS Presents OS with virtual host configured by VMM OS sees standard ring 0 access to virtual IA-32 resources OS access to privileged state trapped by VMM PCI (expansion) bus I/O Chipset virtualization support Handles operations initiated by I/O device Interrupts and DMA accesses by I/O device Intercepted by VMM and remapped Modern Microprocessors — Fall 2012 Operating mode designed for VMM Grants highest privilege access to host CPU hardware state I/O I/O I/O ISA Bridge Mode transitions VM entry VMX root operation → VMX non-root operation ISA/EISA bus VM exit VMX non-root operation → VMX root operation disk Virtual Machines I/O VM Entry Dr. Martin Land 25 Virtual Machine Control Structure Virtual Machines User VMX root VMM Hardware Host OS Dr. Martin Land 26 Referenced by physical address No page table entry in any guest address space Location determined by VMM software VMCS structure Not determined by architecture Defined as set of VMCS access host instructions VMM author chooses implementation VM entry Loads table pointers from VMCS Pointer updates cause context shift to VM process VMM can optionally inject virtual event (interrupt) to cause VM response VM exit VM saves context to memory All VMs exit to common entry point in VMM VM exit records details of reason for exit in VMCS VMM provides detailed response to VM exit Saves processor state to VMCS host-state area Loads processor state from VMCS guest-state area VM exit Saves processor state to VMCS guest-state area Loads processor state from VMCS host-state area VMCS host-state area Segment register selectors for VMM operations Privileged system table pointers (GDTR, IDTR, TR, page table root) VMCS guest-state area Segment register selectors for OS operations Virtual system table pointers determined by VMM VMM physical address space not mapped to guest OS virtual address space Interrupt flag (IF) Virtual Machines Modern Microprocessors — Fall 2012 VMX non‐root VMCS Details Virtual-machine control structure (VMCS) Used for mode transition management VM entry Modern Microprocessors — Fall 2012 VM Exit Dr. Martin Land 27 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 28 VMCS Control Fields VT‐x Solves Virtualization Problems Settable options for interrupt virtualization External‐interrupt exiting VM exit on external interrupt External interrupts not maskable by guest Interrupt‐window exiting VM exit if guest allows interrupts Ring aliasing and compression Guest software runs at intended privilege level Address-Space Compression Guest/VMM transitions can change virtual address space Guest software has full use of its own address space VMCS resides in physical address space Guest/Host mask for control register virtualization Status flags in control registers determine processor options VMM masks selected flags to prevent write by guest Guest write to masked flag causes VM exit Guest reads flag value specified by VMM in VMCS VM exit bitmaps VMM chooses subset of guest actions that cause VM exit Exception bitmap — 32 exceptions that optionally cause VM exit I/O bitmap — each 16-bit I/O port can be set to VM exit on guest access Instruction bitmap — selects privileged instructions that cause VM exit Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land Does not use not linear address space Nonfaulting access to privileged state VMCS controls interrupts VMM allows guest OS access to privileged registers Accesses cause transition to VMM System calls Guest OS runs at ring 0 as intended Interrupts VMCS controls response to interrupt through VMCS 29 VT‐x Exception Handling not set in bitmap Virtual Machines Dr. Martin Land 30 Interrupt Virtualization Set option external-interrupt exiting OS handles OS continues VMM services exception Exception set in bitmap Modern Microprocessors — Fall 2012 VM entry Interrupt VM exit to VMM VMM prepares system tables event injection VM entry VM exit to VMM Event injection replicates interrupt VMM updates system tables event injection Possible updates — interrupt tables, system registers, I/O configuration, ... Event injection replicates exception Possible updates — page tables, system registers, I/O configuration, ... Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 31 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 32 VT‐d for PCI Chipset Virtualization VMM allocates resources to guest OSs Virtual address space CPU Virtual I/O devices mapped to real I/O devices OS accesses real I/O device through VMM mapping DMA remapping OS configures virtual I/O devices DMA Protection Domains Bridge I/O RAM I/O Protection domain Subset of physical memory allocated for device-initiated DMA Protection domains may be allocated to VMM Guest OS Driver process running under guest OS I/O I/O device May be assigned to a protection domain Can only perform DMA to assigned protection domain DMA address translation I/O device DMA request to bridge contains memory address VT-d treats request address as DMA virtual address (DVA) Enables device-initiated DMA operations to guest address space Real I/O device must write to guest OS through emulation mapping Interrupt remapping Real I/O devices my interrupt CPU Interrupt intended for one guest OS Real I/O device must deliver interrupt to guest OS through emulation mapping Guest Physical Address (GPA) of guest OS General software-generated virtual I/O address DVA translated to Host Physical Address (HPA) http://www.intel.com/technology/itj/2006/v10i3/index.htm Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 33 Mapping I/O Devices to Protection Domains Modern Microprocessors — Fall 2012 Device Dr. Martin Land 34 Address Space Overview PCI device requester ID Identifies DMA device and request PCI Bus Virtual Machines VMM Function Page Tables Assigned by PCI configuration software during device discovery Root-Entry Table Index — 8-bit bus number from requester ID Entry — Pointer to context-entry table Guest Virtual Memory Page Structures HPA Host Physical Memory GVA HPA GPA DVA Context Entry Table DMA Virtual Memory Root Entry Table Guest Physical Memory Context-Entry Table Index — 8-bit device/function number from requester ID Entry — pointer to page structure used to translate DVA PCI Bus Device Function I/O device DMA Request ID Page structure Multilevel table structure Similar to IA-32 page tables Modern Microprocessors — Fall 2012 Virtual Memory Virtual Machines Dr. Martin Land 35 Modern Microprocessors — Fall 2012 Emulated Physical Memory Physical Memory Virtual Machines Virtual Memory Dr. Martin Land 36 IA‐32 Interrupt Handling Message Interrupt Handling Legacy interrupts Interrupt controller in chipset handles device interrupts Local APIC CPU interrupt controller Receives/decodes local interrupt signals Receives interrupt messages from I/O APIC Programmable Interrupt Controller (PIC) integrated into ISA chipset APIC (Advanced PIC) integrated into PCI chipset I/O device assigned interrupt request (IRQ) connection to APIC APIC Translates device IRQ to 8-bit CPU interrupt number n Sends hardware interrupt signal (INTR) to processor I/O APIC PCI chipset interrupt controller Receives/decodes device IRQ signals Sends/receives interrupt messages CPU Loads 64-bit entry n from Interrupt Descriptor Table (IDT) Entry points to Interrupt Service Routine (ISR) Message signaled interrupts (MSI) I/O APIC in PCI chipset formats IRQ signal into structured message Message transferred on PCI bus as device-initiated DMA operation Local APIC in CPU receives and decodes message IA-32 Intel Architecture Software Developer’s Manual Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 37 Interprocessor Interrupts Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 38 Interrupt Remapping Interprocessor Interrupt (IPI) Subset of APIC interrupt message table CPU writes to interrupt command register (ICR) in local APIC Local APIC issues IPI message on system bus Used to boot and spawn threads in multiprocessor system Message signaled interrupt (MSI) Encodes interrupt vector and destination processor Real I/O device not aware of guest OS view of emulated I/O device VMM must intercept MSI VMM redefines interrupt message format Provides substitute MSI DMA write request contains Message identifier No interrupt attributes (vector and destination processor) Requester ID of real I/O device generating interrupt Requester ID mapped through table structure (root/context tables) Points to interrupt remapping table (IRT) Entry provides vector and destination processor Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 39 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 40 Caching of Remapping Structures VirtualBox VT-d supports hardware caching of remapping tables Root/Context tables Paging structures IOTLB Interrupt remapping table entries Open source hosted VMM by Oracle (Sun Microsystems) Runs on Intel and AMD x86 hardware Runs above Windows, Linux, Mac OS X (Intel), Solaris Provides VM with guest OS Standard DOS, Windows, Linux, OS/2, FreeBSD, Solaris Uses hardware virtualization support if available (not required) VMM responsible for maintaining remapping cache Must invalidate stale cache entries Scheduling Host OS grants timeslice to VM VM sub-processes scheduled by guest OS Remapping errors DMA access request returns error message Application Device response to error implementation dependent Application Errors logged to VMM Application Guest OS VirtualBox Hypervisor VMM may reset cache or I/O device configuration tables Host OS x86 Hardware Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 41 VirtualBox Architecture Virtual Machines Dr. Martin Land 42 Dr. Martin Land 44 CPU Operating Modes Front-end (client) VirtualBox hypervisor Runs above host OS Without Intel VT performs workaround virtualization Runs native (not emulated) on CPU Host applications at ring 3 Host OS code at ring 0 Guest "safe" application code at ring 3 Hypervisor runs in ring 0 of guest context Guest OS runs as user program in ring 1 of guest context Limited use of Intel VT if available Non-system activities Makes system calls to guest OS Runs emulated on CPU at ring 3 Guest application code that causes guest OS interventions Application Guest OS Disable interrupts Trap of prohibited accesses Executes real mode code Hypervisor Back-end (server) Host OS Ring 0 driver in host OS VirtualBox Driver Copes with "gory details of x86 architecture" Allocates physical memory for VM (guest OS) Saves/restores guest CPU context during host interrupt Each instruction interpreted by VirtualBox driver Interpreted code run in CPU instead of native code Runs native on CPU at ring 1 Guest OS ring 0 code VirtualBox driver handles "gory details" of workaround Registers and descriptor tables No intervention in guest OS process management Modern Microprocessors — Fall 2012 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 43 Modern Microprocessors — Fall 2012 Virtual Machines Xen Xen Architecture Open source system VMM Runs on Intel and AMD x86 hardware Runs directly above hardware Xen hypervisor Directly above hardware Boots system on on start-up Domain 0 Initialized by hypervisor on boot Runs XenLinux — modified Linux kernel Provides Domain Management and Control (DMC) Domain U VM running guest OS Linux required to build and install Xen Provides VMs with guest OSs Linux, Solaris, Windows XP, 2003 Server Hardware virtualization support required for Windows guest OS Para-virtualization for Linux/Unix guest OS OS kernel modified to support Xen explicitly Operating systems ported to run on Xen Similar effort to porting OS to new hardware platform Para-virtual machine architecture very similar to native hardware User space applications and libraries not modified DMC Application Application Application XenLinux OS Domain 0 Guest OS Domain U Guest OS Domain U Guest OS Domain U Xen Hypervisor x86 Hardware Xen Architecture Overview, http://wiki.xensource.com/xenwiki Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 45 Hypervisor Modern Microprocessors — Fall 2012 Virtual Machines 46 Domain 0 Full privilege Operates directly on hardware in ring 0 Functions CPU scheduling for virtual machines Memory partitioning Provides hardware abstraction to virtual machines No awareness of Networking External storage devices Video Domain Domain U U Common I/O Xen Hypervisor x86 Hardware Scheduler CPU XenLinux Modified Linux kernel running in unique VM over hypervisor Direct privileged access rights to physical I/O resources Provides I/O virtualization to Domain U guest VMs Generic I/O drivers Network Backend Driver Manages local networking hardware Processes all VM networking requests from Domain U guests Block Backend Driver Domain U Domain 0 Partitioner Process List Page Tables Manages local storage disk Processes all read/write data requests from Domain U guests Virtual Machines Domain U Domain U Scheduler I/O CPU Memory Modern Microprocessors — Fall 2012 Dr. Martin Land Domain 0 I/O Drivers Partitioner Process List Page Tables I/O Memory Dr. Martin Land 47 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 48 Domain U PV Domain U HVM Domain U PV guests Paravirtualized VM running modified Linux/UNIX kernels OS expectations No direct access to host hardware Shares host hardware with other VMs Guest drivers provide I/O access Access backend drivers in Domain 0 PV Network Driver PV Block Driver Domain U HVM Guests Fully virtualized machines Run standard Windows or other unmodified OS OS runs as VMX non-root operation with VT-x OS expectations No hardware virtualization Not sharing with other VMs Normal hardware access for boot Domain 0 Domain 0 Domain U OS Driver OS Driver daemon Xen virtual firmware runs as VMX root operation with VT-x Simulates BIOS expected by OS on initial startup Domain 0 Domain 0 Domain U PV Driver PV Driver Backend Driver I/O support No special drivers Domain 0 runs Qemu-dm daemon for each HVM Guest Supports Domain U HVM Guest for networking and disk access requests Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 49 Domain Management Virtual Machines Dr. Martin Land 50 Domain U PV to Domain 0 Communication Xend daemon Python application running in Domain 0 System manager for Xen environment Processes requests as XML remote procedure call (RPC) Domain U PV Guest requests I/O from Domain 0 via hypervisor No direct support in hypervisor for I/O Inter-Domain event channel Domain 0 and each Domain U have shared memory area Asynchronous inter-domain interrupts implemented in hypervisor Qemu-dm Daemon handles networking and disk requests from Domain U HVM Provides full emulation of hardware for standard OS I/O drivers Example — Domain U PV Guest data write to hard disk Guest OS sends write request to PV block driver Guest PV block driver Virtual firmware Writes data to Domain 0 shared memory through hypervisor Sends inter-domain interrupt to Domain-0 through hypervisor Provides full emulation of BIOS for Domain U HVM Guest OS Xend Qemu Unix Application Linux Application Windows Application XenLinux OS Domain 0 XenUnix Domain U PV XenLinux Domain U PV Standard Windows Domain U HVM Dr. Martin Land Triggers PV Block Backend Driver access to shared memory Reads blocks from Domain U PV Guest shared memory Writes data to hard disk x86 Hardware Virtual Machines Domain 0 receives interrupt from hypervisor Backend Driver Xen Hypervisor Modern Microprocessors — Fall 2012 Modern Microprocessors — Fall 2012 51 Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 52 I/O Driver Communication Xen PV and HVM Performance Test Configuration Intel Xeon @ 2.3 GHz 4 GB DDR2 533 MHz memory 160 GB Seagate SATA disk Intel E100 Ethernet controller Unix Application DMC Write request Write disk Interrupt Backend block Driver XenLinux Domain 0 Read shared memory PV block driver XenLinux Domain U PV Interrupt Write shared memory Xen Hypervisor Interrupt Interrupt x86 Hardware Dong, et. al., "Extending Xen with Intel Virtualization Technology", Intel Technical Journal Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 53 Dr. Martin Land 55 I/O Bottleneck Bottleneck — Single Ethernet controller Guest OS tasks waiting for I/O access hides performance degradation caused by virtualization Web server running over native Linux without Xen Threads compete above 2.5 Gbps Web server running over XenLinux in Domain 0 Threads compete above 1.9 Gbps Web server running over XenLinux in Domain U PV Threads compete above 0.9 Gbps Modern Microprocessors — Fall 2012 Virtual Machines Modern Microprocessors — Fall 2012 Virtual Machines Dr. Martin Land 54