Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fernando Martins Director Virtualization Strategy and Planning Tom Adelmeyer Principal Engineer Virtualization Performance and Benchmarking INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel, the Intel logo, Intel Leap ahead, Intel Leap ahead logo, Intel vPro, Intel vPro logo, Intel VIIV, Intel VIIV logo, Intel Centrino Duo, Intel Centrino Duo logo, Intel Xeon, Intel Xeon Inside logo, Intel Itanium 2 and Intel Itanium 2 Inside logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries." *Other names and brands may be claimed as the property of others. Copyright © 2006 Intel Corporation. Throughout this presentation: VT-x refers to Intel® VT for IA-32 and Intel® 64 VT-i refers to the Intel® VT for IA-64, and VT-d refers to Intel® VT for Directed I/O and its extensions The confluence of compelling usage models and robust solutions is driving virtualization to mainstream adoption New usage models require radically new approaches to performance measurement and capacity planning This session will describe Intel’s portfolio of virtualization technologies, and through practical examples provide a deep technical dive into the challenging problem of meaningful benchmarking in a virtualized environment We will discuss Intel’s research in the space and share our latest results, including vConsolidate - Intel’s seed contribution to a vendor-agnostic standard virtualization benchmark currently being developed by SPEC Intel’s Strategy for Virtualization Intel® Virtualization Technology Evolution Current and Emerging Usage models Usage Model based Benchmarking 20.00% 15.00% Apr-07 Forecast Update Feb-07 Forecast Update 10.00% Sep-06 WW Forecast 5.00% 0.00% 2005 2006 2007 2008 Server Virtualization is now considered a mainstream technology among IT buyers. IT professional are bullish in future use: driving 45% server use in 12 months 2009 2010 41% of new server x86 purchased in 2007 will be virtualized - IDC End User Study; Jun-06 -IDC Directions 2007 Feb-07 >81% of business are using virtualization in production environments - 451 Group Special Report – Dec-06 Platform of Choice for Virtualization Broad Ecosystem Support Remove Adoption Barriers Leadership in HW assists for Virtualization CPU virtualization (VT-x and VT-i) IO virtualization (VT-d) Networking virtualization (IOAT and VMDq) Better Platform Reliability Features Leader in Reliability features Proven Platform Architecture: 40X more Intel servers More Power/Performance Headroom Quad-Core 4-way NICs Q4’05 IDC server Tracker, 1996-2005 total system shipped IA-based System Virtualization Today Requires Frequent VMM Software Intervention Standards for IO-device sharing: Multi-Context I/O Devices Endpoint Address Translation Caching Under definition in the PCI-SIG* IOV WG Hardware support for I/O virtualization Device DMA remapping Direct assignment of I/O devices to VMs Interrupt Routing and Remapping Establish foundation for virtualization in the IA-32 and Itanium architectures… Software-only VMMs Binary translation Paravirtualization Simpler and more Secure VMM through foundation of virtualizable ISAs *Other names and brands may be claimed as the property of others … followed by on-going evolution of support: Micro-architectural (e.g., lower VM switch times) Architectural (e.g., Extended Page Tables) Increasingly better CPU and I/O virtualization performance and functionality as I/O devices and VMMs exploit infrastructure provided by VT-x, VT-i, VT-d New CPU Operating Mode VMX Root Operation (for VMM) Non-Root Operation (for Guest) Eliminates ring deprivileging Apps Apps WinXP Linux New Transitions VM entry to guest OS VM exit to VMM VM Control Structure (VMCS) Configured by VMM software Specifies guest Operating System (OS) state Controls when VM exits occur (eliminates over and under exiting) Supports on-die CPU VM Entry VM VMCS Exit Configuration H/W VM Control Structure (VMCS) Extended Page Table (EPT) A new page-table structure under the control of the VMM Map guest-physical to hostphysical (accesses memory) Performance Benefit Guest OS able to freely modify its own page tables Eliminates VM exits due to page faults, INVLPG, or CR3 accesses Memory Savings Shadow page tables required for each guest user process (w/o EPT) A single EPT supports entire VM No VM Exits VT-x with EPT Platform implementation for I/O virtualization Defines an architecture for DMA remapping Implemented as part of core logic chipset Will be supported broadly in Intel server and client chipsets Improves system reliability Contains and reports errant DMA to software Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs DMA-remapping Improves reliability and security through device isolation Improves I/O performance through direct assignment of devices Improves I/O performance of 32 bit devices that happen to use bounce buffer condition Interrupt-remapping Interrupt isolation: isolate interrupts across VMs Interrupt migration: efficiently migrate interrupts across CPUs Address Translation Services (ATS) Support for ATS capable endpoint devices DMA remapping performance improvements Processor Chipset Network Intel’s holistic design approach delivers Platforms built to excel in Virtualization Hardware Virtualization Mechanisms under VMM Control VM1 VMn … VM1 App App OS OS OS HW0 HWn App VMn … App App App OS OS OS VMM VMM HW HW VM1 App VMn … OS VM1 App App OS OS VMn … VM1 VMn VM1 VMn App App App App App OS OS OS OS OS VMM VMM VMM VMM VMM HW0 HWn HW0 HW0 HW Traditional benchmarking covers Performance, Power, Scalability Metrics: Throughput (MB/s), Response time, #users, etc Micro-architecture focus: cache sizing, frequency, bandwidth, etc. New technology requires new areas of analysis and metrics Areas of focus driven by use models. E.g., VM migration time, VM utilization Need to measure how Intel® Virtualization technology benefits end-users and ISVs Virtualization presents unique challenges Which configurations to focus on Homogeneous or heterogeneous OS Number Virtual Machines Configuration of individual VMs (CPU, Memory, NIC, HBA, HDD) Measuring performance Virtual clock accuracy induces platform dependent error Availability of performance monitoring capabilities Consolidation use case adds additional testing challenges Synchronicity: Use automation scripts Utilization: Avoid harmonic bottlenecks Steady State: Easy, repeatable measurements Only way to overcome the challenges is to develop the benchmarks Tier consolidation using SAP SD vConsolidate: a server application consolidation benchmark SAP SD (Sales and Distribution) OLTP-style benchmark that measures performance of a server running the Enterprise Resource Planning (ERP) solution from SAP AG Tier Consolidation Database and app server run in VMs Benefits of 3-Tier (isolation, maintainability), cost of 2-Tier Benchmark value Reuse existing Metrics New focus area Inter VM communication VM1 VMn App Svr OS DB OS VMM HW vConsolidate Description Benchmark that represents predominant use case -> server application consolidation Application types selected for consolidation guided by market data vConsolidate provides A methodology for measuring performance in a consolidated environment A means for fellow travelers to publish virtualization performance proof points The ability to analyze performance across VMMs and hardware platforms Knowledge obtained SPEC virtualization workload Business Processing 26.2% Database 28.5% Decision Support 9.2% Collaborative 8.4% Application Development 12.0% Web Infrastructure 6.8% IT Infrastructure 4.8% Technical 3.5% Other 0.6% 5 Virtual Machines 3 Clients: Controller, Mail, and Web *Other names and brands may be claimed as the property of others Consolidation Stack Unit – (CSU) Smallest granule in vCon Consist of 5 Virtual Machines Database Commercial Mail Web Server Java Application Server Idle Each CSU represents single score Final score is aggregate of the individual CSU scores Workload vCPUs Web Webbench Mail Loadsim Database Sysbench Java SPECjbb Idle 1 1 1 1 1 Workload vCPUs Web Webbench Mail Loadsim Database Sysbench Java SPECjbb Idle 2 1 2 2 1 ProfileProfile #1 #2 Profile # Workload 1 vCPUs vMemory vMemory OS App vCPUs vMemory OS OS App AppvCPUs Web Windows Windows Windows Webbench 1 1.0 GB 32-bit 1.0 GB 32-bit 2 1.5 GB 32-bit IIS IIS 2 IIS Mail Windows Windows Windows LoadsimExchange1 Exchange 1.0 GB 32-bit 1 1.0 GB 1.5 GB32-bit 32-bit Exchange1 Database Windows Windows Windows Sysbench 1.0 GB 32-bit 2 1.0 GB 1.5 GB32-bit 64-bitMS SQL MS SQL 1 MS SQL 2 Java Windows Windows Windows SPECjbb 1 1.7 GB 32-bit BEA JVM 1.7 GB 32-bit 2 2.0 GB 64-bit BEA JVM BEA JVM2 Windows Windows Windows Idle 1 1 0.4 GB 32-bit 1 0.4 GB 0.4 GB32-bit 32-bit Profile # 2 vMemory OS Windows 1.5 GB 32-bit Windows 1.5 GB 32-bit Windows 1.5 GB 64-bit Windows 2.0 GB 64-bit Windows 0.4 GB 32-bit ProfileProfile #3 #4 Profile # Workload 3 vMemory vMemory OS AppvCPUsvCPUs vMemory OS OS App AppvCPUs Web Linux Linux Windows Webbench 1.5 GB 32-bit 2 1.5 GB 2.0 GB32-bit 32-bitApache IIS 2 Apache 2 Mail Windows Windows Windows Loadsim 1 1.5 GB Exchange 1.5 GB 32-bit 2 2.0 GB32-bit 32-bit Exchange Exchange2 Database Linux Linux Windows Sysbench 2 1.5 GB 64-bit 1.5 GB 64-bit 4 2.0 GB 64-bitMySQL MySQL MS SQL 4 Java Linux Linux Windows SPECjbbBEA JVM2 BEA JVM 2.0 GB 64-bit 4 2.0 GB 2.0 GB64-bit 64-bit BEA JVM4 Windows Windows Windows Idle 1 1 0.4 GB 32-bit 1 0.4 GB 0.4 GB32-bit 32-bit Profile # 4 vMemory OS Windows 2.0 GB 32-bit Windows 2.0 GB 32-bit Windows 2.0 GB 64-bit Windows 2.0 GB 64-bit Windows 0.4 GB 32-bit App IIS Exchange MS SQL BEA JVM App IIS Exchange MS SQL BEA JVM Running vConsolidate Controller application Starts the tests via helper scripts; Runs for 30 minutes Stops the test and reports score Time measured in “Controller Client” external timer Scoring The “Controller” application calculates final score SpecJBB, Sysbench and Loadsim - transactions/ second WebBench – throughput CSU Final Score = GEOMEAN (VM Relative Perf[i]) Example Scoring Web Java Database Mail 319 14236 201 13.5 #CSU CPU% Web Raw Relative 1 65% 1124 3.52 Java Raw 14842 Database Mail Relative Raw Relative Raw Relative 1.04 229 1.14 VM relative scores = Measured/Reference (E.g., WebBench = 3.52) 1 CSU score: GEOMEAN (3.52, 1.04, 1.14, 1.16) = 1.48 15.6 1.16 Higher is better Lower is better Seeding Industry with Benchmark Workloads vConsolidate– Consolidated stack of business workloads consisting of Server Side Java, Commercial Database, Commercial Mail, Commercial Web Server on 4 VMs Collaborating with Virtualization leaders Microsoft and OEMs - consolidation workloads, methodology & metrics VMware – VMmark* consolidation stack Establishing benchmarks with ISV/OSVs Contributing to standard benchmarks through SPEC (long term) *Other names and brands may be claimed as the property of others. Platform of Choice for Virtualization Dedicated HW support Reliability Leadership High Performance / Energy Efficient Broader Ecosystem Support VMM vendors, ISVs, OEMs, SIGs, Standards Removing Adoption Barriers Education Programs / Best Practices New Benchmarks Dual Port 10/100/1000 x4 PCI Express* Gigabit Ethernet Controller PCIe x4, x2, x1 SMBus RMII External Interfaces Dual 1000BASE-T, SerDes, and SGMII interfaces PCIe ver 1.1 x4 Intel® I/O Acceleration Technology (IOAT2) MSI-X Low Latency Interrupt Direct Cache Access Header-splitting and replication Virtualization support (VMDq): 4 TX/RX Queues (per port) I/O Enhancements Offloads compatible with IPv4, IPv6 & multiple VLAN tags Receive Side Scaling Manageability PCIPCI-Express DMA/Host Interface TX FIFO FIFO RX Mgmt FIFO RAM GbE MAC SerDes PHY Mgmt DMA/Host Interface TX FIFO FIFO RX Mgmt FIFO RAM GbE MAC PHY SerDes PXE, iSCSI Boot RMII, SMBus Interfaces ECC on all memory 25mm x 25mm FCBGA Schedule Sampling now Production: Q2’07 SerDes 1000 /SGMII BASE-T SerDes 1000 BASE-T /SGMII Unique Intel x86 Reliability Features Description Feature Benefit Memory ECC Data Integrity & Availability Detects & corrects single-bit errors Enhanced Memory ECC Data Integrity & Availability Retry double-bit errors vs. standard memory ECC that does single-bit errors only Memory CRC (FBD) Continued Operation & Availability Intel Xeon Other x86 processor Based Based Servers Servers Data Availability Predicts a “failing” DIMM & copies the data to a spare memory DIMM , maintaining server available & uptime Memory Mirroring Data Protection Data is written to 2 locations in system memory so that if a DRAM device fails, mirrored memory enables continued operation and data availability Symmetric Access to all CPUs Server Continuity Enables a system to restart and operate if the primary processor fails Memory Sparing Address & command transmissions are automatically retried if a transient error occurs vs. the potential of silent data corruption A Better Business Foundation Less Downtime, Higher Service Availability and Improved Confidence Enabled by a combination of processor, chipset and platform memory technologies. Data as of March 6, 2006 Intel Virtualization Technology For Directed I/O Monolithic Model VMn VM0 Guest OS and Apps Guest OS and Apps I/O Services Service VM Model Service VMs I/O Services Device Drivers Pass-through Model Guest VMs VMn VM0 Guest OS and Apps VMn VM0 Guest OS and Apps Guest OS and Apps Device Drivers Device Drivers Device Drivers Hypervisor Shared Devices Pro: Higher Performance Pro: I/O Device Sharing Pro: VM Migration Con: Larger Hypervisor Hypervisor Hypervisor Shared Devices Assigned Devices Pro: High Security Pro: I/O Device Sharing Pro: VM Migration Con: Lower Performance VT-d Goal: Support all Models Pro: Highest Performance Pro: Smaller Hypervisor Pro: Device assisted sharing Con: Migration Challenges VT-d is platform infrastructure for I/O virtualization Defines architecture for DMA remapping Implemented as part of platform core logic Will be supported broadly in Intel server and client chipsets CPU CPU System Bus North Bridge DRAM VT-d Integrated Devices PCIe* Root Ports PCI Express South Bridge PCI, LPC, Legacy devices, … Basic infrastructure for I/O virtualization Enable direct assignment of I/O devices to unmodified or paravirtualized VMs Improves system reliability Contain and report errant DMA to software Enhances security Support multiple protection domains under SW control Provide foundation for building trusted I/O capabilities Other usages Generic facility for DMA scatter/gather Overcome addressability limitations on legacy devices DMA Requests Dev 31, Func 7 Device ID Virtual Address Length … Bus 255 Dev P, Func 2 Page Frame Bus N Fault Generation Bus 0 Dev P, Func 1 Dev 0, Func 0 DMA Remapping Engine Translation Cache Context Cache Memory Access with System Physical Address Device Assignment Structures Device D1 4KB Page Tables Address Translation Structures Device D2 Address Translation Structures Memory-resident Partitioning And Translation Structures VT-d hardware selects page-table based on source of DMA request Requestor ID (bus / device / function) in request identifies DMA source VT-d Device Assignment Entry 127 64 Rsvd Domain ID Rsvd Address Width 63 0 Page-Table Root Pointer Rsvd Ext. Controls Controls P VT-d supports hierarchical page tables for address translation Page directories and page tables are 4 KB in size 4KB base page size with support for larger page sizes Support for DMA snoop control through page table entries VT-d Page Table Entry 63 0 Rsvd Page-Frame / Page-Table Address Available S P Rsvd Ext. Controls W R Requestor ID 15 8 7 Bus 3 2 DMA Virtual Address 0 63 57 56 48 47 000000b 000000000b Device Func 39 38 30 29 21 20 12 11 0 Level-4 Level-3 Level-2 Level-1 Page Offset table offset table offset table offset table offset Base Device Assignment Tables Page Level-4 Page Table Example Device Assignment Table Entry specifying 4-level page table Level-3 Page Table Level-2 Page Table Level-1 Page Table Architecture supports caching of remapping structures Context Cache: Caches frequently used device-assignment entries IOTLB: Caches frequently used translations (results of page walk) Non-leaf Cache: Caches frequently used page-directory entries When updating VT-d translation structures, software enforces consistency of these caches Architecture supports global, domain-selective, and page-range invalidations of these caches Primary invalidation interface through MMIO registers for synchronous invalidations Extended invalidation interface for queued invalidations PCI Express protocol extensions being defined by PCISIG for Address Translation Services (ATS) Enables scaling of translation caches to devices Devices may request translations from root complex and cache Protocol extensions to invalidate translation caches on devices VT-d extended capabilities Support for ATS Enables VMM software to control device participation in ATS Returns translations for valid ATS translation requests Supports ATS invalidations Provides capability to isolate, remap and route interrupts to VMs Support device-specific demand paging by ATS capable devices VT-d Extended features utilize PCI Express enhancements being pursued within the PCI-SIG A VMM must protect host physical memory Multiple guest operating systems share the same host physical memory VMM typically implements protections through “page-table shadowing” in software Page-table shadowing accounts for a large portion of virtualization overheads VM exits due to: #PF, INVLPG, MOV CR3 Goal of EPT is to reduce these overheads CR3 Guest Linear Address EPT Base Pointer (EPTP) Guest IA-32 Guest Physical Address Page Tables Extended Page Tables Host Physical Address Extended Page Table A new page-table structure, under the control of the VMM Defines mapping between guest- and host-physical addresses EPT base pointer (new VMCS field) points to the EPT page tables EPT (optionally) activated on VM entry, deactivated on VM exit Guest has full control over its own IA-32 page tables No VM exits due to guest page faults, INVLPG, or CR3 changes CR3 Guest Linear Address Host Physical Address + Page Directory Page Table EPT Tables + EPT Tables + Guest Physical Page Base Address EPT Tables Guest Physical Address All guest-physical memory addresses go through EPT tables (CR3, PDE, PTE, etc.) Above example is for 2-level table for 32-bit address space Translation possible for other page-table formats (e.g., PAE)