Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Designing Tomorrow’s Computing Platforms Challenges, Solutions, and Tools Sudhanva Gurumurthi e-mail: [email protected] Talk Outline • Modern Computer Architecture – The Good – The Bad – The Ugly • My Previous Work • Current and Future Research The Good Source: http://www.intel.com/technology/silicon/mooreslaw/ Microprocessor Technology Advancement • Plentiful Transistors – Superscalar, SMT, CMP – Larger caches, deeper memory-hierarchy – High-bandwidth access to memory • Simultaneously, clock frequencies have grown tremendously Storage Has Become Ubiquitous Density Speed Growth in Drive Performance Source: Hitachi GST Technology Overview Charts, http://www.hitachigst.com/hdd/technolo/overview/storagetechchart.html The Bad Power Dissipation 90 80 Power (W) 70 60 50 40 30 20 10 0 8086 286 386 486 Pentium Pentium Pentium III 4 Particle Induced Soft-Errors 0 1 Source: FACT Group, Intel Are you kidding me? • No!! – In 2000, Sun Microsystems reported random crashes in one of its server products due to no parityprotection in the caches. – Eugene Normand’s study of the error-logs of large systems indicated several such errors – There are conference sessions and even full conferences/workshops devoted to this problem – Have personal experience collecting and analyzing soft-error data Where Do These Particles Come From? • Neutrons – Terrestrial cosmic rays • Alpha particles – Packaging Should we worry? • Yes!! – Thanks to Moore’s Law • Lower operating voltages • Exponential increase transistor integration density • Power management (voltage-scaling) – Larger systems • Impractical to shield against cosmic rays – Need several feet of concrete – Radiation-hardening hurts performance, area, and cost Redundant Multi-Threading Input Replicator Output Comparator Rest of the System Source: Mukherjee et al, “Detailed Design and Evaluation of Redundant Multithreading Alternatives”, ISCA’02 Performance of Redundant MultiThreading 45 Percentage of IPC Lost 40 35 30 25 20 15 10 5 0 gzip sw im vpr gcc m esa art m cf equake parser vortex bzip2 Temperature Affects Disk Drive Reliability • Heat-Related Problems – Data corruption – Higher off-track errors – Head-crashes Disk drive design constrained by the thermal-envelope • Puts a limit on drive performance Source: D. Anderson et al, “More than an Interface – SCSI vs. ATA”, FAST 2003. Thermal-Constrained Design Data Rate =~ (Linear-Density)*(RPM)*(Diameter) 1 platter Increase Data-Rate RPM Shrink Platter (RPM)2.8 40% Annual (Dia)4.6 IDR Growth Capacity Lower Lower Data Rate Capacity (# Platters) Increase RPM Temperature Power =~ (# Platters)*(RPM)2.8(Diameter)4.6 The Bad Drive Temperature 2.6" 2.1" 1.6" 100 Thermal-Envelope Year 20 12 20 11 20 10 20 09 20 08 20 07 20 06 20 05 20 04 20 03 10 20 02 Temperature (C) 1000 The Bad Data Rate 30-60% Performance Boost for 10,000 RPM Increase Search-Engine Thermal Behavior Thermal Envelope = 45.22 C The Ugly Design Tools • Designing complex systems requires extensive simulation • Need to model all aspects of the system – Software layers – Power – Temperature – Effect of faults Simulation Problems • Painfully slow – Speed vs. Accuracy • No good support available for modeling effects like temperature and reliability • Can themselves be hard to write • Buggy My Previous Work Thesis Work: Power Management of Enterprise Storage Systems Enterprise Storage Market Growth • Storage demand growing at annual rate of 60% – By 2008, a company would manage 10 times the storage it has today. Sources: 1. “Enterprise Storage: A Look into the Future”, TNM Seminar Series, Oct. 31, 2000 2. “More Power Needed”, Energy User News, Nov. 2002 Power Demands of Data Centers “What matters most to the computer designers at Google is not speed but power – low-power – because data centers can consume as much electricity as a city”, Eric Schmidt, CEO, Google • Data centers consume several Megawatts of power • Electricity bill – $4 billion/year – Disks account for 27% of computing-load costs • Difficult to cool at high powerdensities Sources: 1. “Intel’s Huge Bet Turns Iffy”, New York Times article, September 29, 2002 2. “Power, Heat, and Sledgehammer, Apr. 2002. 3. “Heat Density Trends in Data Processing, Computer Systems, and Telecommunications Equipment”, 2000. Data Center Cooling Costs Servers Air-Conditioning Other 7% 42% 51% • Data center of a large financial institution in New York City – Power consumption ~ 4.8 MW Source: “Energy Benchmarking and Case Study – NY Data Center No. 2”, Lawrence Berkeley National Lab, July 2003. Where Does Power Go? Active = 11 W Spindle Motor (SPM) Idle = 9 W Standby = 1 W Voice-Coil Motor (VCM) 4W Seek = 13 W Traditional Power Management (TPM) Idleness Detected Disk Active Disk Request Disk Active Idle Spinup Spindown Standby Mode Time I/O Characteristics of Server Systems • Large number of disks – RAID arrays • Heavier I/O loads sustained over long periods. • Stringent performance requirements. • Server disks physically different – Not made to use spindowns. – Longer spindown/spinup latencies • Server Disk - Hitachi Ultrastar – 15 seconds/26 seconds • Laptop Disk - Hitachi Travelstar – 4.5 seconds Feasibility of Applying TPM • No prior study on how to tackle this problem systematically. • Questions 1. Is there idleness? 2. Can we do TPM? • Answers 1. Yes 2. No! Why?? • Large number of very short duration (few ms) idle-periods The Solution • Traditional Power Management – Not effective for server workloads • Power =~ (# Platters)*(RPM)2.8(Diameter)4.6 – All three can be varied at design-time to meet the power budget • Laptop vs. Server disk – RPM could be varied dynamically • Dynamic RPM (DRPM) Potential Benefits of DRPM TPMperf DRPMperf Combined % Savings in Eidle 80 70 60 50 40 30 20 10 0 10 100 500 1000 10000 Mean Inter-Arrival Time (ms) 100000 Control-Policy Performance Research Impact • The feasibility study [ISPASS’03] started off new research in server disk power management – Active groups: UIUC, Rutgers, UMass, UArizona, Rochester • DRPM paper [ISCA’03] widely cited in architecture and systems conferences like ISCA, HPCA, ASPLOS, SOSP, OSDI • Multi-speed drives starting to appear in the market – Hitachi Deskstar 7K400 My Other Work • Microarchitectural Techniques to Enhance Redundant Multi-Threading Performance – Instruction Reuse [ISCA’04] • Soft-Error Data Collection and Analysis from Actual Systems (Intel) • Soft-Error Tolerant Cache Coherence-Protocols (Intel) • Simulator Design – SoftWatt [HPCA’02] – MEMSIM (IBM Research) More Details About My Work • Papers: – S. Gurumurthi et al., Disk Drive Roadmap from the Thermal Perspective: A Case for Dynamic Thermal Management, ISCA 2005. – A. Parashar et al., A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy, ISCA 2004. – S. Gurumurthi et al., DRPM: Dynamic Speed Control for Power Management in Server Class Disks, ISCA 2003. – S. Gurumurthi et al., Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach, HPCA 2002. • Available via my CS Department homepage. Some Research Directions • Temperature-Aware Storage Systems – Devices – Systems issues • Multi-Dimensional Approach to Fault Tolerance – Tradeoffs between performance, power, reliability – Dynamic adaptation • Microarchitectural Support for Security • Design of accurate and fast simulation tools Research Directions in Storage • Storage architecture is still quite a nascent field • Plenty of research opportunities: – Emerging technologies • MEMS, holographic, molecular storage – New Research Avenues • • • • Security Application/Content-Awareness Active disks Long-term and survivable storage Looking for Students! • Shall be offering a research course in Spring 2006. – Many project opportunities • Contact Information: – E-mail: gurumurthi@cs – Office: 236B, Olsson Hall Divider Slide Approach 1: Seek Throttling T VCM On E M P Thermal-Envelope VCM Off E R A T U R E TIME Results 2-42% reduction in IPC gap (avg. 23%) 100.00% 80.00% 70.00% 60.00% DIE-IRB-1K-sat DIE-2xALU 50.00% DIE-IRB-ideal 40.00% 30.00% 20.00% 10.00% e A ve ra g ip 2 25 6. bz rt ex 25 5. vo rs er p Benchmark 19 7. pa 18 8. am m ua ke cf 18 3. eq 18 1. m 17 9. ar t c es a 17 7. m 17 6. gc r 17 5. vp 17 1. sw im ip 0.00% 16 4. gz Percentage of IPC Gap (SIE-DIE) recovered 90.00%