* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CHEP_09_Pawlowski_final - Indico
Survey
Document related concepts
Transcript
More Computing with Less energy Steve Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise Group Intel Corporation CHEP ‘09 March 24, 2009 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. This document may contain information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Wireless connectivity and some features may require you to purchase additional software, services or external hardware. Nehalem, Penryn, Westmere, Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Intel, Intel Inside, Pentium, Xeon, Core and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2009 Intel Corporation. 2 Real World Problems Taking Us BEYOND PETASCALE SUM 1 ZFlops Of Top500 100 EFlops #1 10 EFlops 1 EFlops 100 PFlops 10 PFlops 1 PFlops 100 TFlops 10 TFlops What we can just model today with <100TF 1 TFlops 100 GFlops 10 GFlops 1 GFlops 100 MFlops 1993 3 Aerodynamic Analysis: 1 Petaflops Example Real World Challenges: Laser Optics: 10 Petaflops • Full modeling of aninaircraft in all conditions Molecular Dynamics Biology: 20 Petaflops • Green airplanes Aerodynamic Design: 1 Exaflops • Genetically tailored medicine Computational Cosmology: 10 Exaflops •Turbulence Understand the origin of the universe in Physics: 100 Exaflops • Synthetic fuels everywhere Computational Chemistry: 1 Zettaflops • Accurate extreme weather prediction Source: Dr. Steve Chen, “The Growing HPC Momentum in China”, June 30th, 2006, Dresden, Germany 1999 2005 2011 2017 2023 2029 A look at CERN’s Computing Growth Source: CERN, Jarp Sverre 120 Tape Space (PetaByte) 100 CERN Tape Library Computing 80 60 21,500 Cores @ 1400 SI2K per core Disk Space (PetaByte) 40 20 0 2007 4 2008 2009 2010 2011 2012 2013 Lots of computing (45% CAGR), lots of data; no upper boundary! Relative Tr Performance Moore’s Law and High Performance Computing 1000 Exa Peta 100 Relative Performance (GFlops as the base) Peta: Today’s COTS 11.5K Processors assuming 2.7 GHz 1.E+08 Tera 500X 1.E+06 10 30X G 250X Exa Peta Tera: ASCI Red 9,298 Processors 1.E+04 2.5M X 4,000X Tera 1 1986 1996 2006 2016 36X 1.E+02 G Source: Intel labs Transistor Performance 1.E+00 1986 1996 2006 2016 From Peta to Exa, 2X Transistor Performance, Requiring ~30K cores @2800 SPI2K 5 A look at CERN’s Computing Growth 120 30,000 Cores @ 2800 SI2K per core Tape Space (PetaByte) 100 80 Computing 60 21,500 Cores @ 1400 SI2K per core 40 Disk Space (PetaByte) 20 Source: CERN, Jarp Sverre 0 2007 6 2008 2009 2010 2011 2012 2013 2014 2015 2016 7 Reach Exascale by 2018 From GigFlops to ExaFlops ~2018 2008 ~1997 ~1987 Note: Numbers are based on Linpack Benchmark. Dates are approximate. “The pursuit of each milestone has led to important breakthroughs in science and engineering.” Source: IDC “In Pursuit of Petascale Computing: Initiatives Around the World,” 2007 8 What is Preventing us? Power is Gating Every Part of Computing An ExaFLOPS Machine without Power Management Power Consumption 1000,000 EFLOP 2015-18 Power? Other misc. power consumptions: … Power supply losses Cooling … etc ? Power (KW) 100,000 100+ MW? Voltage is not scaling as in the past 10000 PFLOP Disk 10MW 10EB disk @ 10TB/disk @10W Comm 70MW 100pJ comm per FLOP Memory 80MW 0.1B/FLOP @ 1.5nJ per Byte Compute 70MW 170K chips @ ~400W each 1000 TFLOP GFLOP MFLOP 100 1964 1985 1997 2008 2018 The Challenge of Exascale Source: Intel, for illustration and assumptions, not product representative 9 HPC Platform Power 3% 5% 2% 1% CPUs 31% 11% CPU Planar & VRs Memory PSUs Memory 26% Fans Planar &VRs HDD 22% Data from P3 Jet Power Calculator, V2.0 DP 80W Nehalem Memory – 48GB (12 x 4GB DIMMs) Single Power Supply Unit @ 230Vac PCI+GFX Peripherals Need a platform view of power consumption: CPU, Memory and VR, etc. 10 Exponential Power and Computing Growth 1 Relative Energy/Op Power at a glance: 5V (assume 31% CPU Power in a system) •Today’s Peta: 0.7- 2 nj/op G Vcc scaling •Today’s COTS: 2nj/op 0.1 (assume 100W/50GFlops) Tera •Unmanaged Exa: if 1GW, 0.31nj/op; Peta 0.01 Exa Exa 0.001 1986 1.E+08 1996 2006 Relative Performance 2016 Peta 1.E+06 1.E+04 Relative Power Tera Relative Performance and Power (GFlops as the base) 1M X 1.E+02 G 4,000X 80X 1.E+00 1986 1996 2006 Unmanaged growth in power will reach Giga Watt level at Exascale 11 2016 To Reach ExaFlops Flops 1.E+15Peta 1.E+14 1.E+13 1.E+12 Tera Intel® Core™ uArch 1.E+11 Pentium® 4 Architecture 1.E+10 1.E+09 Giga Pentium® II Architecture Pentium® Architecture 1.E+08 1.E+07 Future Projection Pentium® III Architecture 386 486 1.E+06 1985 Source: Intel 1990 1995 2000 2005 2010 2015 Power goal = 200W / Socket, to reach Linpack ExaFlops: • 5 pJ / op / socket * 40 TFlops - 25K sockets peak or 33K sustained, or •10 pJ / op / socket * 20 TFlops - 50K sockets peak (conservative) 12 Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations. 2020 Parallelism for Energy Efficient Performance 10000000 Many Core Relative Performance 1000000 100000 Multi Threaded Future Projection 10000 Speculative, OOO 1000 Super Scalar 100 486 386 10 286 8086 1 0.1 0.01 1970 13 Multi-Core Era of Pipelined Architecture 1980 Era of Instruction Level Parallelism 1990 2000 Era of Thread & Processor Level Parallelism 2010 Intel estimates of future trends. Intel estimates are based in part on historical capability of Intel products and projections for capability improvement. Actual capability of Intel products will vary based on actual product configurations. 2020 Reduce Memory and Communication Power Chip to memory ~1.5nJ per Byte ~300pJ per Byte Core-to-core ~10pJ per Byte Chip to chip ~100pJ per Byte Data movement is expensive 14 15 16 17 Solid State Drive Future Performance and Energy Efficiency SSD GigaBytes Assume: Capacity of the SSD grows at a CAGR of about 1.5; historical HDD at 1.6 5000 Vision 10 ExaBytes at 2018: 100 • 2 Million SSD’s vs. ½ Million HDD 50 Future projection 0 2008 2010 2012 2014 2016 Source: Intel, calculations based on today’s vision 2018 • If @2.5w each, total 5MW • If HDD (300 IOPS) and SSD (10k IOPS) constant: SSD has 140X IOPS Innovations to improve IO: 2X less power with 140x performance gain 18 Reliability, Reliability and Reliability Density is on the Rise Reliability is an Issue Simplify for Reliability •Solid State Drives or Diskless nodes •Moore’s Law provides more transistors •Many core provides more computing •HPC requires super high socket count large numbers •Silent Data Corruption (SDC) •Detectable Uncorrectable Errors (DUE) Computing Errors •Fewer cables by using backplanes •Simpler node design (fewer Voltage Regulator Modules, fewer capacitors, …) Simplification Mean Time Between Failure (MTBF) Trends down: (Probably) (large number) = Probably NOT 19 Increase Data Center Compute Density Compute Density Silicon Process 20 New + Technology Small Power + Form Factor + Management Data Center + Innovation Target 50% yearly improvements in performance/watt Year Source: Intel, based on Intel YoY improvement with SpecPower Benchmark Revised Exascale System Power ExaFLOPS Machine without Power Mgmt Other misc. power consumption: … Power supply losses Cooling … etc 100+ MW? ExaFLOPS Machine Future Vision Other misc. power consumption: … Power supply losses Cooling … etc <<100MW Disk 10MW 10EB disk @ 10TB/disk @10W SSD 5MW Comm 70MW 100pJ com per FLOP Comm 7MW Memory Compute 80MW 0.1B/FLOP @ 1.5nJ per Byte 70MW 170K chips @ ~400W each Memory 16MW Compute 8-16MW 10EB SSD @ 5TB/SSD @2.5W 10pJ comm per FLOP 0.1B/FLOP @ 300pJ per Byte 25K - 80K chips @~200W each Source: Intel, for illustration and assumptions, not product representative 21