Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction Lectures Slides from MKP and Sudhakar Yalamanchili (1) Reading • Sections 1.1, 1.2, 1.3, 1.5, 1.6, 1.7 (ed. 4) • Sections 1.1-1.5, 1.7,1.8 (ed. 5) (2) Reminder • High-level language ECE 2035 Level of abstraction closer to problem domain • Assembly language Textual representation of instructions • Hardware representation Encoded instructions and data ECE 3056 How does this work? (3) Historical Perspective • ENIAC built in World War II was the first general purpose computer Used for computing artillery firing tables 80 feet long by 8.5 feet high and several feet wide Each of the twenty 10 digit registers was 2 feet long Used 18,000 vacuum tubes Performed 1900 additions per second –Since then Moore’s Law –Transistor density doubles every 18-24 months – Modern version –#cores double every 1824 months (4) Moore’s Law Goal: Sustain Performance Scaling 5 From wikipedia.org (5) Feature Size We are currently at 0.032µm and moving towards 0.022µm Source: Courtesy H.H. Lee, ECE 3055 (6) New Rules: The End of Dennard Scaling GATE DRAIN SOURCE tox L • Voltage is no longer scaling at the same rate • Slower scaling in power per transistor increasing power densities From R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974. (7) 7 Post Dennard Performance Scaling æ ops ö æ ops ö Perf ç ÷ = Power (W ) ´ Efficiency ç ÷ è s ø è joule ø W. J. Dally, Keynote IITC 2012 (8) 8 Power Wall • In CMOS IC technology Power Capacitive load Voltage 2 Frequency ×30 5V → 1V ×1000 (9) Technology Trends • Electronics technology continues to evolve Increased capacity and performance Reduced cost Year Technology 1951 Vacuum tube 1965 Transistor 1975 Integrated circuit (IC) 1995 Very large scale IC (VLSI) 2005 Ultra large scale IC DRAM capacity Relative performance/cost 1 35 900 2,400,000 6,200,000,000 (10) Memory Wall 1000 CPU µProc 60%/yr. “Moore’s Law” 100 Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM DRAM 7%/yr. 1 Time (11) Understanding Cost X2: 300mm wafer, 117 chips, 90nm technology X4: 45nm technology • What happens if you simply port a design across technology generations? • What about design costs? Hardware and software (12) Integrated Circuit Cost Cost per wafer Cost per die Dies per wafer Yield Dies per wafer Wafer area Die area 1 Yield (1 (Defects per area Die area/2)) 2 • Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design (13) Impact on Design From http://umairmohsin.wordpress.com/2009/12/23/beyond-the-core-intel-roadmap-2010/ (14) Average Transistor Cost Per Year Source: Courtesy H.H. Lee, ECE 3055 (15) Classes of Computers • Desktop computers General purpose, variety of software Subject to cost/performance tradeoff • Server computers Network based High capacity, performance, reliability Range from small servers to building sized • Embedded computers Google Data Center From nytimes.com Wireless sensor node Hidden as components of systems Stringent power/performance/cost constraints Smartphones and Pads - portable graphical computer terminals - connect to Internet and Cloud services eecs.berkeley.edu (16) Components of a Computer The BIG Picture • Same components for all kinds of computer Desktop, server, embedded • Input/output includes User-interface devices o Display, keyboard, mouse Storage devices o Hard disk, CD/DVD, flash Network adapters o For communicating with other computers (17) Anatomy of a Computer Output device Network cable Input device Input device (18) Opening the Box (19) Inside the Core (CPU) • Datapath: performs operations on data • Control: sequences datapath, memory, ... • Cache memory Small fast SRAM memory for immediate access to data (20) Inside the Processor • AMD Barcelona: 4 processor cores (21) A Safe Place for Data • Volatile main memory Loses instructions and data when power off • Non-volatile secondary memory Magnetic disk Flash memory Optical disk (CDROM, DVD) (22) Networks • Communication and resource sharing • Local area network (LAN): Ethernet Within a building • Wide area network (WAN: the Internet • Wireless network: WiFi, Bluetooth (23) Multicore • Multicore microprocessors More than one processor per chip • Parallel programming Compare with instruction level parallelism o o Hardware executes multiple instructions at once Hidden from the programmer Hard to do o o o Programming for performance Load balancing Optimizing communication and synchronization (24) Multicore, Many Core, and Heterogeneity NVIDIA Keplar • Performance scaling via increasing core count • The advent of heterogeneous computing AMD Trinity Intel Ivy Bridge Different instruction sets (25) Concluding Remarks • New Rules Power and energy efficiency are driving concerns • Cost is an exercise in mass production Relationship to ISA? • Instruction set architecture The hardware/software interface is the vehicle for portability and cost management • Multicore Core scaling vs. frequency scaling Need for parallel programming need to think parallel! (26) Study Guide • Moore’s Law • Technology Trends Explain the shift to power and energy efficient computing • Understanding Cost What are the major elements of cost? • Multicore processor Distinguishing features • Basic Components of a Modern Processor (27) Glossary • Energy efficiency • Performance scaling • Dennard Scaling • Parallel programming • Die yield • Power efficiency • Feature size • Power Wall • Heterogeneity • • Moore’s Law Tick-tock development model • Wafer • Multicore architecture • Memory Wall (28)