* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download System VMs
Survey
Document related concepts
Transcript
System VMs This material is based on the book, Virtual Machines: Versatile Platforms for Systems and Processes, Copyright 2005 by Elsevier Inc. All rights reserved. It has been used and/or modified with permission from Elsevier Inc. System VMs Support multiple guest OSes on single hardware platform; all running the same ISA Linux Application Windows Application OS/2 Application Linux OS Windows OS OS/2 OS Virtual Intel x86 Virtual Intel x86 Virtual Intel x86 Intel x86 Hardware System VMs 2 System VM Outline Applications Virtualizing Processors Virtualizing Memory Virtualizing I/O Formal Virtualizability – ISA features Case Studies – IBM VM, x86/VMware, Intel VT-x System VMs 3 Applications Simultaneous support for multiple OSes/Apps • Simultaneous support for different OSes/Apps • E.g. Windows and Unix Error containment • Easy way to implement timesharing If a VM crashes, the other VMs can continue to work Assumes VMM is correct (smaller/simpler) Operating System debugging • Can proceed while system is being used for normal work System VMs 4 Applications, contd. Operating System Migration • Can proceed while “old” OS continues to be used TIME New Release Old Release System Programmers Production Users Converted Production Users Unconverted Production Users System Programmers Converted Production Users Permanently Unconverted Production Users new release being tested System VMs new release installed newer release being tested 5 Applications, contd. Retrofitting new features • Support for multiple networked machines on one physical machine • Allows debug of network software Enables complex debugging and performance monitoring tools • Have VMM transform new device into a virtual device By putting them in the VMM (not the guest OS) Education System VMs 6 History Early-60s IBM M44/44X • Mid-60s IBM CP-40 • • • VM for modified IBM 7044 “close enough to a virtual machine to show that ‘close enough’ did not count” Time-sharing system that protects users via virtual machines aka “pseudo machines” Used modified IBM 360/40 Implemented via assoc. memory and microcode VMs used real memory; VMM managed virtual memory CMS • • Cambridge (conversational) monitor system – Single user OS developed for VMs (like DOS) System VMs 7 History Mid/late-60s IBM 360/67 -- CP-67 • • Late 60s/early 70s • First 360 with VM. CMS an essential part VMs blossomed as a research topic Early 70s several VM implementations • • • • Honeywell DEC RCA Several university projects System VMs 8 System VMs Virtual Machine Monitor (VMM) manages real hardware resources All Guest systems must be given logical hardware resources All resources are virtualized • • Linux applications Windows applications OS/2 applications Linux Windows OS/2 By partitioning real resources By sharing real resources Guest state must be managed • • Virtual Machine Monitor (VMM) x86 PC By using indirection By copying System VMs 9 System VMs: Processor Mgmt/Protection VMM runs in system mode • VMM manages/protects processor through conventional mechanisms Guest OSes run in user mode Guest OSes do not have direct control over hardware resources All attempts to interact w/ hardware resources are intercepted by VMM VMM manages shadow copies of Guest System state (incl. control registers) VMM schedules and runs Guest Systems System VMs 10 VM Timesharing VMM Timeshares resources among guests • Similar to OS timesharing applications VMM VMM restores determines next architected state VM to be for next VM activated VMM sets timer VMM sets PC to timer interval and VMM saves Timer interrupt interrupt handler of OS enables architected state occurs in next VM interrupts of running VM First VM Active System VMs VMM Active Next VM Active 11 Native and Hosted VMs Virtual Machine Applications Virtual Machine VMM Virtual Machine VMM OS VMM Host OS Hardware Hardware Hardware Hardware Traditional uniprocessor system Native VM system User-mode Hosted VM system Dual-mode Hosted VM system System VMs Host OS Non-privileged modes Privileged Mode 12 Virtualizing State VMM Memory Indirection Hold guest state in VMM memory Change pointer on guest switch Example: registers Processor Register Block Register values for VM 1 Register values for VM 2 Pointer Register values for VM 3 System VMs 13 Virtualizing State Copying Hold guest state in VMM Memory Copy state on guest switch VMM Memory Processor Register values for VM 1 Register values Processor for VM 2 Registers Register values for VM 3 System VMs 14 Processor Management/Protection Application Traps and interrupts (& sys calls) • • • Guest OS “return” to user app. • • Transfer to VMM VMM determines appropriate Guest OS VMM transfers to Guest OS • • • Guest OS privileged operation next instruction Transfer to VMM VMM bounces return back to Guest app. Read/Write of protected control registers • system call/trap Trap to VMM VMM reads/modifies guest copy May modify shadow copy Returns to Guest virtual vector location: VMM check privileges perform operation return vector location: System VMs 15 OS VMs: Key Issue – ISA Virtualizability What if privileged instruction no-ops in user mode? (rather than trapping) • What if user can access memory with real address? • Then… VMM can’t intercept when Guest OS attempts the privileged instruction Then… a guest OS may see that the real memory it really has is different from the memory it thinks it has What if user can read system control registers? • Then… guest OS may not read the same state value that it thinks it wrote System VMs 16 Virtualizability (Popek, Goldberg, 74) Classic work in formalizing OS VM concepts Defines basic VM properties Defines properties of instruction sets Proves that VMM can be constructed if instruction set properties hold Extends to recursive VMs Reduces to hybrid VMs System VMs 17 VM Properties Virtual Machine: efficient, isolated duplicate of the real machine Virtual Machine Monitor: software that implements VMs Essential VMM characteristics 1) 2) 3) Provides an environment essentially identical to the real machine Except timing and availability of resources Programs show only minor decreases in speed Mostly native instruction execution Has complete control of system resources System VMs 18 Privileged Instructions, Definition: Trap if executed in user mode; not in supervisor mode Privileged instructions are required to trap • No-op in user mode is not enough System VMs 19 Control Sensitive instructions: 1. All instructions that change the amount of (memory) resources (or the mapping) • • base/limit register in simplified paper version page table in general 2. All instructions that change the processor mode Instructions that provide control of resources Examples: • • • Load TLB (if TLB is architected) Load control register Return to user mode System VMs 20 Behavior Sensitive instructions: 1. 2. All instructions whose results depend on the mapping of physical memory All instructions whose behavior depends on the mode Instructions whose behavior depends on configuration of specific resources (and who owns them) Examples: • • Load physical address POPF (Intel x86): Interrupt-enable flag remains unaffected in user mode System VMs 21 Instruction Types -- Summary NonPrivileged Innocuous Privileged Behaviorsensitive Sensitive Controlsensitive Sensitive Innocuous Instructions: Those that are not control or behavior sensitive System VMs 22 VMM components Instruction trap occurs These instructions Dispatcher desire to change machine resources, e.g. Load Relocation Bounds Register Privileged Instruction Allocator Privileged Instruction Privileged Instruction Interpreter Routine 1 Privileged Instruction Interpreter Routine 2 These instructions do not change machine resources, but access privileged resources, e.g. IN, OUT, Write TLB System VMs Interpreter Routine n 23 VMM components Dispatcher • • Allocator • Decides which system resources should be provided and to manage shared resources among VMs Interpreters • Target of vectored traps – entry point for VMM Decides which of other components to call Emulate the effects of privileged instructions VMM runs in supervisor mode; all other software in user mode System VMs 24 Privileged Instruction Handling LPSW: Load Program Status Word Includes Mode Bit and PC (among other things) Guest OS code in VM VMM code (user mode) (privileged mode) Dispatcher Privileged instruction (LPSW) … ... … ... Next instruction (target of LPSW) System VMs LPSW Routine: Change mode to privileged Check privilege level in VM Emulate instruction Compute target Restore mode to user Jump to target 25 Virtual Machine “requirements” 1. 2. 3. All innocuous instructions are executed by the hardware directly The allocator must be invoked when any program attempts to affect system resources Any program executes exactly as on real hardware except • • For timing Availability of system resources A VMM satisfies all three requirements Precise versions of informal definitions given earlier System VMs 26 Virtual Machines: Main Theorem A virtual machine monitor can be constructed if the set of sensitive instructions is a subset of the set of privileged instructions Proof shows Equivalence by interpreting privileged instructions and executing remaining instructions natively Resource control by having all instructions that change resources trap to the VMM Efficiency by executing all non-privileged instructions directly on hardware A key aspect of the theorem is that it is easy to check System VMs 27 Recursive Virtualization Virtual Machine Virtual Machine Virtual Machine VMM Virtual Machine 2nd level VMM Non-privileged modes Privileged Mode Hardware System VMs 28 Recursive Virtualization Running a VMM as a VM on a VM on a VM…. Theorem: A conventional third generation computer is recursively virtualizable if it is (a) virtualizable, and (b) a VMM without any timing dependences can be constructed for it Proof – A VMM is a program and from the VM theorem will be “identically performing” except for timing dependences and resource constraints. Timing is excluded in the theorem; Resource constraints only limit the depth of recursion. System VMs 29 Hybrid Virtualization Some ISAs are more virtualizable than others • • User sensitive instructions Executed in user mode and can change memory resources or processor mode, or whose behavior depends on real memory locations Supervisor sensitive instructions Executed in supervisor mode and can change memory resources or processor mode, or whose behavior depends on real memory locations System VMs 30 Hybrid Virtualization: Theorem A hybrid virtual machine monitor can be constructed if the set of user sensitive instructions is a subset of the set of privileged instructions Nonprivileged supervisor sensitive instructions are OK Example: PDP-10 JRST 1 – return to user mode • When the VMM executes the VM supervisor, it must use some form of emulation to locate supervisor sensitive instructions • (does not trap if already in user mode) Low efficiency, but only in VM supervisor, not user code If a user sensitive instruction is not privileged, then the VMM must emulate all the user code • • Fails efficiency condition But “binary translation” be done more efficiently than interpreting System VMs 31 Case Study: Virtualizing the x86 ISA x86 Evolved through many extensions Instruction set is not (strictly) virtualizable • Nor is it hybrid virtualizable System VMs 32 X86 Processor Control Uses “baroque” late 1970s style protection rings • Four rings, 0-3; 0 1 2 3 Unix was a reaction to this style OS Kernel High priority drivers and OS services Low priority device drivers User 0 1 2 3 Transfer to lower ring (higher privilege) must go through “gate” System VMs 33 Memory Mapping Segments map to 2GB memory space 2 GB space maps to fixed-size pages Segment descriptor info • • • • • Valid, Base, Limit, Type (code or data) R/W rights, Descriptor Privilege Level (DPL) Etc. System VMs 34 Memory Mapping code segment register 2GB Memory segment register base, limit, rights (R/W), Desc. Priv. Level (DPL) Req. Priv. Level (RPL) Code stack segment register segment register 2 Level Page Table data segment register Real Pages Data segment register data segment register segment register loaded into segment registers data segment register segment register data segment register Segment Descriptor Tables segment register Descriptor Table Registers System VMs 35 Addressing Addressing is via Segment Registers Segment Registers • • • • CS code segment SS stack segment DS, ES, FS, GS data segments All memory accesses are via a segment register Segment descriptors are entered into segment registers • • And given an RPL, Requestor Privilege Level In some cases privilege is lowest of RPL,DPL e.g. when pointers are passed System VMs 36 X86 Processor/Memory Protection CPL –current protection level, normally determined by DPL of current code segment • CPL == processor mode To access data, CPL DPL To call procedure, must enter through gate if CPL(callee) < CPL(caller) << this is a very abbreviated description >> System VMs 37 X86 Instruction Set Virtualizability Ordinarily: • • To virtualize, everything runs at level 3 IN, INS, OUT, OUTS – I/O instructions • • • • Levels 0,1,2 == supervisor Level 3 == user Perform check CPL IOPL (I/O Privilege Level) Not privileged (by Goldberg’s defn) Control sensitive (I/O is resource), action sensitive to CPL Could be user sensitive POPF, PUSHF • • • Push/pop stack to/from EFLAGS register EFLAGS contains IOPL (among other things) And this flag indicates IO privilege level of current task System VMs 38 X86 ISA Virtualizability, contd. SGDT, SIDT, SLDT, SMSW, STR • • • Copy descriptor pointer register, or system state information Typical manual entry: “The SGDT and SIDT instructions are only useful in operating-system software; however, they can be used in application programs without causing an exception to be generated” E.g. behavior sensitive, non privileged VERR/VERW • • • Verify if addressed segment is readable or writeable by CPL – Seem like perfectly reasonable instructions, BUT behavior sensitive and not privileged System VMs 39 X86 ISA Virtualizability, contd. LAR/LSL • • • • • MOVs, PUSH/POP to/from segment registers • • LAR -- load access rights and DPL LSL – load segment limit May no-op, in effect, if CPL isn't good enough. I.e. performs CPL/RPL check before it does inst. Behavior sensitive and not privileged Copy RPL from segment register Behavior sensitive and not privileged Pre-Scanning is probably a necessity System VMs 40 Hybrid Virtualization: Patching Scan Guest OS, find problem instructions, replace with jump to VMM Code Patch for discovered critical instruction Scanner and Patcher Control transfer, e.g. trap VMM Original Program System VMs Patched Program 41 Hybrid Virtualization: Code Caching Scan Guest OS, “translate” into code cache, find problem instructions, replace with jump to VMM Specialized Emulation Routines Block 1 Block 1 Code section emulated in code cache Control transfer, e.g. trap Translation Table Block 2 Code Cache Block 3 Block 2 Two critical instructions combined into a single block Block 3 Patched Program System VMs VMM 42 Virtualizing Memory: Review OS memory region PT Pointer process 1 PT user user Context switch OS managed Real Pages super process n PT user super System VMs 43 Virtualizing Memory Real memory partitioning? • • Guest manages its virtual page tables Guest page table addresses are write protected VMM manages shadow page tables that reflect actual mapping to physical pages • Could be fixed partition per guest => inefficient Typically flexible partitioning via VMM management Note Real / Physical page distinction VMM can change shadow page table by writing page table pointer • i.e. virtual machine state change via indirection System VMs 44 Virtualizing Memory – Example VMM-managed Physical Pages VMM memory region Guest 1 Shadow PTs PT Pointer process 1 user mode PT Guest 1 ShadowPTs process 1 super mode system PT call context switch process n user mode PT process n super mode PT guest OS switch Guest 1 PT Pointer Guest 1 OS memory region process 1 PT user user Guest 1 OS managed "Real" Pages context switch super Guest n Shadow PTs process n PT user Guest n PT Pointer super System VMs 45 Virtualizing Memory – Operations Guest OS Guest application performs system call • • • • • next instruction Trap to VMM VMM changes shadow mapping to reflect guest privilege change Guest OS performs context switch • write PT pointer Writes PT pointer Trap to VMM VMM writes guest PT pointer VMM modifies shadow PT pointer VMM Shadow PT ptr check privileges write guest PT ptr write shd. PT ptr return Guest PT ptr System VMs 46 Virtualizing Memory -- TLBs TLB plays role of page table Page table is just a software structure of which the VMM has no special knowledge Assume TLB entry: • Virtualize TLBs • • • PId, Protection bits, usage bits, real page frame VMM keeps track of Guest’s copies VMM manages real copy Real TLB holds subset of pages mapped in Guest copy Virtualize PIds • • VMM manages real PIds Keep track of mapping from guest PIds to real PIds At any given time all TLB entries with same PId are associated with same guest System VMs 47 Virtualizing TLBs TLB Read/Write are privileged instructions • Guest OS write TLB • • • Behavior and control sensitive Intercepted by VMM VMM updates guest’s virtual copy VMM may modify real TLB Guest OS read TLB • • • Intercepted by VMM VMM reads guest’s virtual copy and returns contents to guest May have to merge in usage data from real version in TLB System VMs 48 Virtualizing TLBs TLB miss • • TLB management • • Traps to VMM VMM check to see if virtual TLB maps page If so, VMM handles it and silently returns Else, VMM reflects fault to guest OS Can’t switch TLBs via indirection as with page tables PId management can give similar control, however Guest system call/returns (privilege changes) • • Flush old mode PId entries New mode TLB entries re-loaded on demand Or write new TLB entries with privileges Use two real PIds per virtual PId One with virtual system mode privileges Other with virtual user mode privileges System VMs 49 Virtualizing I/O Hardest part of virtualization • • • Many device types Many devices of each type Each with its own driver New devices may be added during lifetime of system In older, “classic” systems, less of a problem • • • Entire system developed by one company Far fewer devices to worry about Channels (IO Processors) isolated key IO software System VMs 50 I/O Architecture I/O instructions • • • Memory mapped I/O • • Special privileged opcodes Similar to loads/stores Address and data read/written on I/O bus Load/stores to special (protected) memory addresses Addresses/data decoded by hardware and translated to I/O addresses/data Addresses indicate I/O devices/registers Data can be status, commands, or real data System VMs 51 I/O Architecture (contd) DMA (block) transfers may require several I/O operations • • • • Starting address(es) Block length Command (read, write, interrupt on completion) Requires exclusive device access Interrupts • From I/O devices to force processor transfers to I/O software routines data bus address bus Decode Data Buff System VMs Status Start Address Block Size 52 I/O Management: review OS manages I/O resource • • User software performs system calls with general I/O requests OS converts I/O calls to driver calls • Allocates space on storage devices, etc. Serializes requests for shared devices Driver contains device-specific software Exact commands, controller registers, etc. Driver generates device (and bus)specific I/O operations System VMs Application system calls Operating System driver calls VM mgr I/O Drivers phy. mem. and I/O operations Hardware 53 Device Types Dedicated • • • Partitioned • • Monitor, mouse, keyboard Device can’t be virtualized; must be shared (under user control) VMM still controls due to privileged mode Disk Make multiple, smaller virtualized versions Shared • Network adapter • VMM manages virtual state information • Translate virtual requests to physical requests Spooled • Printer • Shared but at coarse granularity System VMs 54 Spooled Devices Two level spool table First write to VM spool area When ready, VMM copies to VMM spool area Then invokes device When device finished Virtual Machine 1 Spool Table Program Status A Printed 1000 11000 400 B Completed 2000 12000 200 C Running 3000 13000 200 D Completed 4000 14000 500 • Both VM and VM spool tables receive “complete” Real loc. Size 10000 Virtual Machine 2 Spool Table 20000 Program Location Status Location Real loc. Size P Running 1000 21000 400 Q Completed 2000 22000 800 VMM Spool Table 30000 VM Program Status 1 A Printed 30000 400 2 Q Printing 31000 800 1 B Waiting 31800 200 1 D Waiting 30400 500 Real loc. Size Optimizations are possible • E.g. VMM uses VM spool buffer System VMs 55 Spooled Devices Virtual Machine 1 Spool Table Program Status Location Real loc. Size A Printed 1000 11000 400 B Completed 2000 12000 200 C Running 3000 13000 200 D Completed 4000 14000 500 10000 Virtual Machine 2 Spool Table 20000 Program Status Location Real loc. Size P Running 1000 21000 400 Q Completed 2000 22000 800 VMM Spool Table VM Program System VMs Status 30000 Real loc. Size 1 A Printed 30000 400 2 Q Printing 31000 800 1 B Waiting 31800 200 1 D Waiting 30400 500 56 Non-existent Devices Implement virtual version only Example: network adapter • Allows VMs on same platform to communicate System VMs 57 I/O Interception Points Attempts to interact with virtual devices are intercepted by VMM which translates to real devices Application At system call interface system calls At driver call interface Operating System driver calls VM mgr At I/O device interface I/O Drivers phy. mem. and I/O operations Hardware System VMs 58 At system call interface System call traps to VMM VMM interprets system call to produce driver calls VMM contains shadow drivers • Guest OS contains virtual I/O code and drivers • (Implement VMM with driver interface compatible with some existing OS?) Must still be executed, for correct guest state updates Problems • • • VMM must interpret all I/O system calls for all guest OSes VMM must have access to drivers for all real devices I/O initiated by guest OS may not always pass through call interface System VMs 59 At driver call interface Guest OS contains driver stubs Guest OS driver calls can operate on generic virtual devices • VMM contains shadow drivers • To simplify conversion • • system calls Guest OS driver calls These drivers correspond to real devices Generic I/O operations passed to VMM and converted to shadow driver calls Problem • Guest Application VMM must have access to real drivers Need generic drivers for each guest OS Guest OSes must have well defined, modular driver call interface System VMs Generic I/O Drivers generic I/O operations VMM .interpret I/O drivers I/O operations Hardware 60 At I/O device interface Guest OSes contain real drivers Low level I/O operations trap to VMM VMM must check/translate I/O operation If legal, VMM performs I/O operation on behalf of guest VMM passes control back to guest Problems • VMM must know some device specifics (even if it doesn’t contain full drivers) VMM must manage serialization for shared devices VMM must check correctness of I/O operations System VMs Guest Application system calls Guest OS driver calls I/O Drivers I/O operations check/ VMM . translate I/O operations Hardware 61 Virtualization with IOPs (IBM Style) IO instruction points to Channel program • • • • Similar to driver Micro-code like Very simple control flow “Packages” sequences of related operations VMM can translate channel program as a whole • • Mostly consists of address re-mapping And dealing with non-contiguous pages Reduces/eliminates problems with I/O sequences that require exclusive access to a device System VMs 62 Case Study: IBM 360/370/390 CP-67 on 360/67 in 1960s • • • VM/370 (1972) led to widespread use of VMs Virtual Machine Assist (1974) • Further enhances VM support Handshaking • Enhancements to support VMs Extended Control Program Support (1978) • First production VM implementation Provided means for supporting timesharing via Multiple guest versions of CMS – single user OS Used basic virtualization concepts described by Goldberg Lets Guest OS in on the secret Interpretive Execution Facility (IEF) System VMs 63 Reasons for VM Slowdown VM initialization • Privileged Instruction overhead • • • Reflect through VMM before getting to Guest OS Virtual Memory Management • Requires trap/reflection back to Guest OS Interrupts • Trap to VMM Interpretation by VMM Return from VMM to guest System Calls (SVC) by guest in user mode • Setting up virtual state Shadow page faults when page is already mapped Duplicated effort between VMM and Guest OS • Memory management done by both System VMs 64 Virtual Machine Assists Ways of making application on VM run faster • Have no performance effect if run in native mode Instruction Emulation Shadow Table Management Virtual interval timer System VMs 65 IBM 370 Virtual Machine Assist Add Control Register 6 (CR6) • • • • • • Bit 0 VM Assist On/Off Bit 1 Virtual user/supervisor state Bit 4 SVC handling On/Off Bit 5 Shadow table fixup On/Off Bit 7 Virtual interval timer assist Bits 8-28 address of VM pointer list CR6 Set by VMM when Guest is dispatched System VMs 66 Instruction Emulation Certain privileged instructions emulated directly in microcode • • • Avoids trap/interpretation by VMM Guest must be in Virtual Supervisor mode (held in CR6) Examples: Load PSW Load Real Address Reset Reference Bit Store Control Supervisor Calls also emulated • • If SVC handling is enabled via CR6 Avoids trap/reflection through VMM System VMs 67 Shadow Table Management When page fault occurs: • • If Guest OS has page mapped and page is already present in real memory but not mapped by guest’s shadow table then VM assist updates shadow table automatically Else, reflects fault to VMM Uses VM pointer list to find guest tables System VMs 68 Performance Improvement Reduction in Supervisor State Time • Reduction in Elapsed Time • 70-90% 40-65% Reduction in Priv. Insts. Interpreted by VMM • 75-95% System VMs 69 Extended Control Program Support Emulates additional Privileged Instructions • Partially handles other Privileged Instructions (with help from VMM) Non-architected instructions for use by VMM • e.g. Purge TLB, Test Channel Examples Decode channel words Dispatch a virtual machine Locate virtual I/O control blocks (many others) Virtual Timer Assist • • Maintains a virtual interval timer for guest VM Real interval timer is a hardware resource System VMs 70 Interpretive Execution Facility Provides a way to execute most of the VMM functions in hardware Function of VMM separated between hardware and software • Advantages of interpretive execution • • • Cleaner separation compared to earlier VM assists Better performance Better predictability of performance Applicable for all types of guest operating systems Key instruction: SIE (Start Interpretive Execution) • • • • Used by VMM to give control to hardware Architectural state of VM in table accessible to hardware Privileged instructions interpreted in hardware Occasionally need to get back to the software part of the VMM System VMs 71 Entry and Exit from IE mode VMM Software . . . SIE . . . . . . Emulation . . Entry into InterpretiveExecution mode Interpretive Execution Mode Exit for interception Exit for host interrupt Host Interrupt Handler System VMs 72 Inter VM Communication Other VMM extensions focus on inter-machine communication by emulating many distributed system features • • e.g. virtual LANs VMs by their nature are isolated – but inter-user communication is also desirable System VMs 73 IBM Handshaking (Para Virtualization) Allow Guest OS to discover that it is running on VMM • • Reduces duplicated effort • Guest “probes” for VMM when it is booted Then informs VMM that it expects VMM support OS can mark all page frames fixed, disable demand paging, bypass channel address translation Pseudo page fault handling • • • Under operator control, VMM notifies Guest OS when VMM is handling a page fault by the Guest VM Guest OS marks faulting task as “page wait” Guest OS Dispatches another task (I.e. whole Guest VM does not have to wait) System VMs 74 VMware: an x86 System Virtual Machine Applying Conventional VMs to PCs – Problems: • • Installing the VMM on bare hardware, then booting Guests onto VMM. Need to support many device types, many more drivers VMware solves both problems Uses Host OS/Guest OS model • • “Hosted VM” Uses Host OS for some VMM functions Including I/O System VMs 75 VMware: Three Main components Begin with already-loaded Host OS VMDriver (Pseudo-Driver) • Host OS-specific • Installed as a driver, but can take over the machine • Acts as conduit between System and User VMMs VMMonitor (System-level VMM) • Slipped under installed OS via Pseudo-Driver VMApp (User-level VMM) Host Apps • Appears as ordinary VMApp application to installed OS VMDriver • Can make normal I/O calls Host OS (and use installed drivers) Virtual Machine Applications OS (eg. Linux, Windows) Hardware (x86motherboard, display, adapters, etc.) User mode VMMonitor Privileged Mode Hardware System VMs 76 VMM Communication VMM control passes back and forth between user and system-level VMM portions User VMM performs system call to pseudodriver; then waits for response System VMM maintains control, then sends response message back to User VMM System VMs 77 Resource Management Host OS schedules processor resource • User-level VMM is just another application Host OS manages memory • • VM memory is allocated as address space of Userlevel VMM User level VMM “mallocs”; whole VM uses it System VMs 78 VMware I/O Guest OS contains generic drivers Generic drivers operate on virtual devices managed by user mode portion of VMM User mode portion of VMM makes normal system calls System calls cause Host OS to use real drivers and devices Guest Application system calls Guest OS VM mgr Generic I/O dirvers phy. mem. and I/O operations SW Virtual VMM Devices VM mgr (user mode) System Calls Host OS VM mgr I/O Drivers phy. mem. and I/O operations Hardware System VMs 79 I/O Sequence Guest application makes system call Intercepted by System-level VMM, reflected to Guest OS Guest OS performs I/O operations specified in generic drivers System-level VMM captures I/O operations, and interprets them Passes operation back up to User-level VMM User-level VMM performs I/O call to Host OS System VMs 80 Example: Network Virtualization Virtual and Physical Network Interface Card (NIC) the same Message Send • • X86 OUT or OUTS plus port# (in range of IDs for NIC) Each port has state bit trap on I/O request VMM saves permission “map” for all ports per guest VM System VMs 81 Example: Network Virtualization Sequence below Guest OUT traps to VMM VMM checks guest permissions before making request to physical NIC User on VM 1 OS on VM 1 VMM Device Driver User sends message to external machine e.g.. usingsend() OS converts into I/O instructions for virtual NIC e.g. OUTS 0xf0,... VMM sends packet on virtual bridge to device driver of physical NIC e.g. OUTS 0x280,…. NIC device driver launches packet on network using wire signals User mode System VMs To network Privileged mode 82 Virtual Network Virtual and Physical NIC different Special case: virtual network User sends message to local virtual machine e.g.using send() OS converts into I/O instructions e.g. OUTS 0xf0,... User on VM 2 OS on VM 2 Receiver gets packet Interrupt handler in OS generates I/O instructions to receive packet User mode System VMs VMM sends packet on virtual bridge to device driver of physical NIC e.g. OUTS 0x280,…. NIC device driver converts send message to a receive message for receiving VM. No wire signals are generated. VMM raises interrupt in receiver’s OS Privileged mode 83 Case Study: Intel VT-x (Vanderpool) x86 Virtualization Extensions recently announced by Intel New VMX mode • Root level • • • Two privilege levels: root and non-root Similar to conventional x86 Plus new VMX instructions VMM runs in root level Non-root level • • • Limited control of resources Including when in ring 0 Guest OS plus apps runs in non-root level System VMs 84 VT-x Operation Transition from normal mode to VMX root mode via vmxon instruction VMM in root level, sets up the environment for each VM and initiates the virtual machine via vmlaunch instruction Attempts to modify resource cause return to root level Explicit vmcall causes return to root mode vmresume instructions causes return to guest in non-root mode vmxoff instruction causes exit from VMX mode vmxon Regular Mode vmlaunch VM1 Root Mode (VMM) vmlaunch VM2 Non-Root (VM1) VM1 exits System VMs vmresume VM2 vmresume vmresume VM2 VM1 vmxoff Non-Root (VM2) VM2 exits Regular Mode VM2 exits VM2 exits VM1 exits 85 VT-x Capabilities Root mode eliminates need to run all guest code in user mode VMM runs in root mode • For code regions with no critical instructions, HW is as efficient as normal machine • VT-x HW maps state-holding data elements directly to native structures during VM execution. VMCS (virtual machine control structure) encapsulates VM state • HW implementation can take over loading and unloading state • No need for VMM to perform load/stores of state info. • Eliminates the need for paravirtualization, Allows standard versions of OSes to be used as guests • The vmcall instruction, can be used to pass hints and data to the VMM if desired • System VMs 86 VMCS Can be implemented by HW or SW in root mode • VMM is implementation-dependent Aligned on 4KB boundary Pointed to by VMPTR Load VMPTR with vmptrld instruction • Read VMCS with vmread ; Write VMCS with vmwrite • State Area Guest State Host State VM Execution Controls Control Area VM Exit Controls VM Entry Controls VM Exit Information System VMs Basic Information Other Exit Information Register State Interruptibility State Register State Pin-based Execution Controls Processor-based Execution Controls Bitmap Fields etc. Control Bitmap MSR Controls Control Bitmap MSR Controls Controls for Event Injection VM-Exit Information Vectoring Event Information Due to Event Delivery Due to Instruction Execution 87 Critical Instructions Programmable VM exit conditions given in VMCS E.g., which instructions should cause exit to VMM Example: Read Time Stamp Counter (RDST) Contained in 64-bit MSR -- IA32_TIME_STAMP_COUNTER • Works in any mode if TSD bit in control register 4 is off • Otherwise works only in Ring 0; otherwise traps (protection mode exception) • System VMs 88 RDST rdtsc instruction encountered Machine in VMX mode? No Yes RDTSC exiting bit is set in VMCS? Yes Perform normal operation Save exit information Exit VM. Return control to VMM No TSD bit of CR4 is set in VM? No Yes Ring 0 operation? No Yes Use TSC Offsetting bit is set in VMCS? Protection Exception. Save exit information. Yes Exit VM. No Return timestamp counter value Return control to Add TSC offset to timeVMM stamp counter value. System VMs Return sum 89